Augmented Reality and its limitations
Augmented reality (AR) is a new experience of discovering the world! Augmented reality shapes our physical world with computer-generated input, where the digital content responds to changes in the user’s environment in real-time.
You might ask, what's AR? Augmented reality lies between reality and the virtual world, it combines the two. Using augmented reality we can shape our physical world with the virtual world, we can track objects on the camera, and we can place 3D models on top of our physical world or change its colors. Augmented reality has tons of use-cases and can be used even in games - do you remember PokemonGO?
Recently, we were tasked with creating an AR application for one of our clients, and we'd like to share our approach to it. But first, let's take a look at what you need in order to start working with AR.
How it works and what you need
Augmented reality requires a camera with the ability to track and record the world around you. On one hand, cameras track and render physical spaces, while on the other, augmented reality analyzes our virtual world or tracks some data. You have probably encountered augmented reality in your life already - the most common examples are image filters on social media like Snapchat, Instagram, or Facebook, and some people have smart houses where they use gesture recognition to activate various systems.
Let's talk about how to create your own AR technology!
Understanding what AR is capable of and what its limitations are is crucial. You will mostly read about target tracking and hand tracking models. To get started with AR, you need a device that has a camera. Although there are dozens of libraries available for creating AR applications, we will continue to focus on the technologies we have used in our project.
Client's needs and how we started
Our client came up with a creative use case for AR technology - an application that would identify your nails and replace them with selected digital models. The company prepared several models to be prototyped, and it was our job to place them on nails using augmented reality.
Having no previous experience with AR, we started researching right away. We focused on finding out what the possible outcome might look like, how it can be done, and if it's even feasible.
After we found a video on TikTok, where the user piperzy placed a virtual nail on her finger using SparkAR, our first goal was to reproduce the same result. Our team was unaware of the limitations and requirements, but it turned out that she used the "target tracking" model. During my colleague's attempt to reproduce the same result as piperzy in her video, I conducted research on the Snap Lens platform. We played around with different AR platforms and libraries while prototyping, including ManoMotion, Apple Vision, and MediaPipe. We will discuss some of these platforms in detail later on.
Let's talk about specific AR models we tried.
Target tracking is an augmented reality feature in which the camera overlays some pre-learned pattern with a 3D object. In practice, this feature is commonly used to create a 3-dimensional presentation of a furniture catalog by 'scanning' the scene in the viewport of the camera.
In contrast with the hand tracking method, target tracking provides true plane representation. Using this technique, it is possible to place a virtual object inside the real world based on the camera's perspective, so it looks like it is part of the real world.
An example object attached to a 3D nail
Target tracking can be achieved with a 3rd party platform, or be embedded via an SDK kit with the latter requiring more development effort.
For our application of target tracking, we've used nail stickers as targets for 3D nail placement. Compared to traditional use cases for the target tracking, this problem is quite specific, as the markers are rather tiny and we want to track the whole hand at once. The upcoming parameters section outlines implementation details/choices for this use case.
Upon recognizing a target, the AR engine will render the associated model. One target image per finger makes sense in our case, which makes five targets for one hand.
The number of targets that can be tracked simultaneously can limit some tools. The Spark AR, for example, has a limit of five such targets, making it suitable for this situation.
This could be hacked around, by supplying one image and having multiple targets divided by transparent gutters.
Target tracking is specific, requiring the camera to be pointed at the target so that the engine can identify it. In other words, the user would have to scan "finger by finger" so that the AR engine can pick up all the markers. This was our experience in the Spark AR and could be the result of poorly designed stickers/targets. On the other hand, the hand tracking method seems to be the better UX method since it always tracks the complete hand, so all of the fingers are identified at once. Also, it is possible that scanning could be problematic for a few gestures or limit the set of 'supported gestures' for nail placement since the thumb finger has a particular angle, which could make it difficult for the user to scan it while maintaining the rest of the fingers in the camera view.
An acceptable gesture would be to place a hand on a surface with all the nails facing the camera. A partially usable gesture would be a 'thumb up' facing the camera where only the four fingers face the camera and the thumb sticker faces away from the camera, making it impossible to track.
A good marker is one that has:
variety of content/asymmetry
A bad marker is one that:
has gradient colors
General best practices for a marker design can be found in Spark AR docs.
Spark AR Platform
SparkAR is a studio from Meta/Facebook enabling users to publish AR experiences on Instagram and Messenger apps. It provides a studio editor that lets you create your own AR without coding. It's very easy to use!
It is one of the technologies that allows you to do proper target tracking. There is one limitation: it doesn't have an embedded solution and you can only use your filters in the Spark app, Messenger, and Instagram.
Target Tracking & Spark AR Summary
We’ve developed a working prototype for paper stickers serving as markers for target tracking in augmented reality. Our solution was implemented in Spark AR Studio and tested with Spark AR Player on iPhone XS. We’ve outlined further options on how the experience could be embedded in a standalone app e.g. via Snap Camera Kit, SDK solution, or with a low-level framework like Mediapipe. The target tracking function in the AR camera is recommended for close-up work as it offers true perspective tracking, which makes the experience look very realistic.
Another AR model that tracks hands-on camera views is hand tracking. The hand tracking model is a machine learning trained model that detects hands in the camera view. A trained model can detect hands with an **output of 21 points ** per hand. 2D Models contain only X and Y screen coordinates relative to the camera, 3D Models also provide depth, ie. Z coordinate for each point. All points are estimated with average hand size.
For our client's solution, we prefer hand tracking over target tracking due to its requirements for markers.
In the image below we can see where the exact points are located.
Hand tracking model, especially the 3D Model, is much more computation-sensitive - it requires a stronger CPU on devices
The estimated length of fingers is based on the average hand size, which leads to imprecision.
The hand tracking model can track only the whole hand - your whole hand must appear in the camera view.
Our problem in the "Get Nailed" project
For our hand tracking app, we've attempted to acquire all 21 points per hand, and based on fingertips, we tried to place our objects on them. This problem is quite specific and unusual. The purpose of hand tracking is not to track nails, but to estimate imprecise fingertip positions. This can be solved with offsets for certain objects, but since offsets are constants, they will not work on different hand sizes.
Hand Tracking vs Target Tracking
Hand tracking does not require any marker or any other prerequisites.
In comparison to target tracking, accuracy is worse. Objects do not appear as a part of the real world.
3D Hand tracking model requires a device with more performance.
Hand tracking allows us to place different objects on each individual point, whereas target tracking is limited to 5 markers (SparkAR).
Snap Lens platform
Using Snap lens studio is free, as is publishing your filter.
Snap Lens allows you to publish your lens at any time and it is very simple! All you need to do is hit the "Publish lens" button. You can also specify visibility for your lens - it can be either, public, private, hidden, or offline (Local). Public means that your lens is accessible to everyone. Private lenses are accessible only via a specific link that you can share. Hidden and offline are accessible only to you!
Before it becomes public or private, it must be approved by the Snap Lens team. From personal experience, the approval took about 1 hour.
Limitations and CameraKit solution
Snap lenses work on Snapchat exclusively. By default, you can not embed it directly into your application, same as in SparkAR. However, Snap lens offers a new embedded solution - CameraKit. It is an SDK that can be embedded into Android or iOS application, but it's currently in closed beta and you have to request Snap Lens for permission before using it.
Snap Lens with Hand Tracking
Snap Lens studio provides tons of AR models including 3D Hand tracking. We can use a pre-created 3D Hand tracking template which is easy to use and edit further. The hand model is limited to 5 hands and as output, it provides [X, Y, Z] coordinates & rotations.
Snap Lens & Hand Tracking - Summary
Updating an existing AR can be done without editing the code - simply via Lens Studio
It turns out that the requirements of our client simply cannot be met with the current technology in the project we worked on. We would need a custom-trained model to track the nails precisely, but one like that currently does not exist.
The best solution would be to train a custom model for a specific tracking area, but this can take a great deal of time and testing samples, and it can be expensive.
Target tracking is an excellent and precise model that corresponds with our client's demands, but the fact that physical nail markers are required to mark nails is unacceptable. And the hand tracking model is a new model that is not precise enough yet.
In this blog post, we discussed Augmented Reality and the 2 models we used in our project: Target Tracking (AKA marker tracking), and Hand Tracking, but there are a lot of other models that can be used for AR applications. We also covered the most popular technologies that can be used for creating AR applications: Spark AR and Snap Lens. In conclusion, if you have no programming background, then Snap lens is what I would recommend to you.
In the next post dedicated to augmented reality, we will focus on MediaPipe and create a simple example with it.
Thanks for reading!