AutoSIRDS — Using single image depth estimation to create Single Image Random Dot Stereograms

First, what is a Single Image Random Dot Stereogram?

Stereograms are illusions of 3D images/surfaces. They are 2D images that encode stereo information so that when viewed correctly, they reveal a hidden 3D image [1]. There are different types of stereograms, but the ones I will be focusing on are maybe the most common, which are the ‘wall-eyed’ or parallel convergence type [2]. I won’t be going into a huge amount of detail, but please check my references as there are some really nice resources that explain some of the fundamentals in much more detail that helped me during this project. Essentially the idea is to create a perceived 3D image by changing displacements of pixels seen by the right and left eye — the brain basically does all the rest.

Image from [1]. Demonstrating the two types of a stereogram. We focus on the divergent type in this post.

To create these images, it requires some array manipulation to shift pixels around, as well as a depth image to know how much these pixels are to be shifted. Here is the simplest case, where the white square will appear in front of the grey square. We use a random dot pattern that is repeated to create the right image, where the pixels in the regions of the two squares are shifted over by some amount:

Simplest case showing a depth map of two squares, where the more white, the more in-front the pixels should appear, and the stereogram pair, when viewed correctly, will show the white square to appear in front of the grey square.

This can be extended to more complex and larger images where an image is built up strip by strip. We can also use a pattern instead of random black and white dots. All that matters is the texture be repeating and have enough detail and contrast that our eye can ascertain which pixels are shifted and by how much:

Image from [3] An example outlining how a single image stereogram is created using a 2D sinusoidal depth map and a repeating texture image.

When creating the strips, each strip needs to be shifted relative to the neighboring strip, in order from left to right. This is so that each left-right pair is its own stereogram and, when compiled together, form a stitched image. The amount by which the pixels are shifted is calculated using the following to convert a greyscale depth map to some number of pixels shifted to the right, where it is assumed that the coordinate range is between [0,0] and [1,1]:

Image from [3] equations used to determine how to convert the depth map image to a shifted texture image.

This part isn’t too tricky. I used Numpy array manipulation, although you could use shaders as well, which I think is the original intent. Using just the above information, and any depth image you can create nice SIRDs like here:

Example showing a single image random dot stereogram of a shark. The depth image is shown where the more white the greyscale the more the pixels are shifted in the texture map.

But now the question comes, well how to get a depth image? There are a number of ways; using your own 3D rendering software like Blender and creating a model is one. You can use even just use MS paint and color objects by how much they are to be in the foreground (white) or background (black) and something in between. Another way is to use MACHINE LEARNING. In this case, we can use a model available on the TF Hub called Midas (v2), which is trained to output a depth map from a single image input. We can now create any depth map from any image input. Here is the original paper [4], where they used this to create Ken Burns effect animations, here is the video supplementary information.

I have made this into a Google Colab notebook with forms so you can play around yourself. I have a few good ones if you use the dropdown examples.

Input image in Google Colab for depth map

Here is the notebook on a high level.
First, we pass any image through the pre-trained network to produce a depth map. As you can see we lose detail in the depth map which makes the effect hit and miss. It needs to be an easily recognizable and simple shape. In this case, I think the T-Rex works well since most people know what it is when they see the silhouette.

User interface for input of texture map and SIRD parameters

Next, we find a texture. For this, I used ColorLovers which has an API that I can randomly get textures, or I can get a link to the single pattern .png which I then use to form the repeating texture image. I have coded it so you can provide an image link, a link to a texture, or just grab a random texture. In the image above I have a link to a texture that I liked in the texture URL path. The input parameters such as depth_factor and num_strips relate to the SIRD, while the parameter scale_factor will reduce the size of the input image. There is usually a size and number of strips that work best depending on the size of the display so you may need to play around. It also depends on the texture and how well our eyes can form the sterogram.

T-Rex from above in a SIRD — using a texture from http://www.colourlovers.com/pattern/5913220/Why_not…

And that’s it! I encourage you to check out the Colab if you want to see the code. Here are some more examples for fun! Enjoy!

I have created a Twitter bot that will post a SIRD if you want to give it a follow, @AutoSIRD

Footer