The Amazing Cartoon Maker

Friday, May 9, 2008

The finale!

The end is here!
Final presentations and last minute tweakings of the application are done with! I have created a primitive UI for the application which would allow the user to tweak parameters relating to the 3 primary processes of the system, namely, blurring, edge detection and color quantization. Since last time around, I have been working on creating that UI. It is structured in such a way that the input video and the output video are run side by side in order to directly compare the differences. The frame rates of both videos are the same, thus showing the interactivity of the system.

Here's a sample look at the user interface...

Here are a couple of videos in order to get a feel of the real time video processing output:

The Butterfly effect

Soloman

In conclusion, I would like to say that this was indeed a really useful and exciting learning experience. I definitely now have much more confidence in Cg and an overall command of GPU concepts. In addition, the extra research that had to be performed on the image processing front, has resulted in an increased interest in this field on my behalf. The end result being that I might take up a course related to Image Processing next semester!! Image processing and vision on the GPU is definitely the way to go!! And I would definitely like to try my hand at implementing this system in CUDA... that would be nice little challenge in itself!

Thanks Gary n Joe for a wonderful semester of learning!!

Tuesday, April 22, 2008

Color Quantization and other random musings...

Last time around, I was about to start up on the color quantization and I guess the most important part of this application... which would actually end up giving the final image a cartoonized look.

Now we had done a similar approach called K-means sorting in one of our assignments, which essentially did do the same thing, but had a couple of drawbacks. Firstly, the bin values had to be hardcoded and secondly, this method requires multiple iterations, which is bound to slow down the entire process. And there is no guarantee that even after a fixed number of iterations, the solution would converge and the final result would be the same as when random bin values would have been picked.

So the task in front of me was to come up with a fast yet flexible method, which would actually take color values for the bin from the image. I came across the 'Median Cut' method, which is supposed to be the one of the best methods for color quantization. The premise behind median cut algorithms is to have every entry in the color map represent the same number of pixels in the original image. In contrast to uniform sub-division (read Hardcoded b, these algorithms divide the color space based on the distribution of the original colors. The algo can be given as follows:

Find the smallest box which contains all the colors in the image.
Sort the enclosed colors along the longest axis of the box.
Split the box into 2 regions at median of the sorted list.
Repeat the above process until the original color space has been divided into n regions, where n is the number of bins.

I was under the impression that this process might not be too fast, but even on the CPU, the creation of the bins executes pretty fast!! Nice!!

So I did this step in 2 parts... the creation of the bins on the CPU and the actual quantization in a fragment shader, by passing the bins as a texture, thus reducing the number of registers used.

In the meantime, I decided to dabble once again on the edge detection front, since my current approach was just a hack, by tweaking one of the parameters of the bilateral filter. So I decided to use the Sobel filter again, but this time I decided to threshold the output, and this did work wonders to the output! So right now I have edges and no noise! Joy to the world!!

So finally, my program comes out to be a 3-pass thing...

First pass: input: image, output: bilaterally filtered image
Second pass: input: bilaterally filtered image, output: edge detected image
Third pass: input: edge detected image, output: color quantized image, rendered to the screen

Original Image

Bilateral filter

Edge Detection

Color Quantization

Currently, I have incorporated both videos and static images into the program, so I am getting outputs in both cases. But here are a few problems, which I hope to smooth out in the next few days:

The output video frame rate isn't the same as the input frame rate.
The number of bins at present has to be hardcoded since Cg doesn't allow variables in the for loop conditions.
In order to use larger number of bins, I have to use an advanced fragment profile like fp40. The others do not support this.

Also, I will have to take some time out to make the UI for this. I am thinking of using Qt for this purpose, although I haven't really used it ever. But if I don't try, I won't know!!

Until next time then!

Monday, April 7, 2008

Lots of updates!

All right! So there have been lots of updates since my last post.. and the following paragraphs will take you through those in brief...

First up I succeeded in porting the RGB to CIELab space and reverse conversion onto the GPU. That speeded up things quite a bit.

Next up was to develop the extended Bilateral Filter, which is essentially an edge preserving smoothening operator. The colors in the output image would be smoothed out but the edges would still be preserved, unlike , say, Gaussian blurring. The authors have given their formula for a simplified Bilateral Filter, and I started on implementing it in a fragment program. The special thing about a bilateral filter is that it acts on the geometric as well as photometric domain of an image thus giving a perceptually well defined smoothening of the image. They have given typical values of the 2 main variables affecting the output, the geometric deviation and the photometric deviation.

I discovered that I had to tweak certain parameters in order to get a nice output, and in the process of this tweaking, I did come across a nice little finding, which I will talk about, after the next section on edge detection.

So once the bilateral filter was working, the next step was to give the output of the previous step to the edge detection fragment program in order to get an image with the edges detected. Initially, I just wrote this as a single pass application with the bilateral filter code commented to test the edge detection. The authors have used a Difference-of-Gaussians edge detection approach, which takes the difference in perception when two different Gaussian filters are applied to the same image with 2 different deviations.

I didn't quite get this approach working, so I had to resort to the simpler and well documented Sobel filter approach. This method does have a drawback of introducing a significant amount of noise in the output image.

It was after this that I started on the multipass approach, in which the first pass consists of applying the Bilateral Filter to the image and writing the output to a texture, which would then be passed to the edge detection fragment program. After this the output of the edge detection would then be blended with the previous filter output to create the final output image. For this I used the glBlendFunc method of OpenGL. It has plenty of options, and I had to experiment with a few options before I got the right combination.

But because of the noise in the edge detected output, the final output image also had the noise, and the result wasn't impressive. So, I started experimenting with different approaches and in the process of this experimenting, I discovered that the Bilateral filter could also act as an edge detector!! There was a certain parameter, which when set to the right value resulted in a smoothed image with the edges detected! So what I got was 2 processes within the same process!

So now there doesn't seem to be any need for the edge detection fragment program, though there is still one important step remaining, that of quantizing the image. Hence I will have to check the output of that step in both cases, using the edge detection program and without it.

The next step in this process is the quantization of the image. There are a number of color quantization approaches documented, and I will have to see which of them suits real time implementation.

Wednesday, March 26, 2008

Video segmentation and Color space conversion!

Its been 10 days since my last post. Since then, I have started on the implementation of my project.

First thing I did was pull up a sample OpenGL-Cg implementation, using pass through vertex and fragment shaders in order to display an input image, rendered as a texture. This was done using the basic assignments that I had performed during the course and from the Cg Tutorial book.

Next step was to decide whether I should continue working on a single image and apply the various steps to this image to get the final stylized look, or should I get my hands dirty with the video file and get the segmentation component over with? By the segmentation component, I mean the framework to break up the video file into individual bitmaps, each of which will then be acted upon as single images and the process applied upon those.

So I figured that, since I had a simple application now, I should get it over with rather than worry over it once things get too complicated. So I started looking into the file format of an AVI file. I have chosen the input video file as an AVI file, since these are made up of bitmaps or more specifically DIBs (Device Independent Bitmaps). Another restriction was that the AVI file had to be uncompressed so that I didn't have to worry about compression and uncompression of the file. So after struggling through some pretty comprehensive information on the internet, I did manage to get my system in place. I used the inbuilt Vfw (VideoForWindows) C++ library for manipulating the AVI files and retrieving the individual frames from the file and getting the actual pixel data from the frames.

The next challenge was the conversion from RGB space to the CIELab space mentioned by the authors. This is a device independent color space designed to represent human vision more accurately.

This again was a nice thing to learn and it did involve some level of complexity. But I have managed to deal with it and gotten both the conversions working, i.e. RGB to CIELab and vice versa, which would be needed while converting the output back for display.

One issue that came up is that I have implemented this conversion on the CPU for now, and it has slowed down things. So as a next task, I will see if I can take this conversion onto the GPU using a fragment shader. It does seem possible. That will definitely speed up this step and aid all future stages.

In the next few days, I hope to get that done with soon and then start off on the actual implementation of the algorithms mentioned in the paper.

Until next time then!

Sunday, March 16, 2008

The Amazing Cartoon Maker!!

Thats what I have named my final project for the CIS 665 - GPU Programming course. This is an implementation of the paper 'Real Time Video Abstraction' by Holgen Winnemoller, Sven Olsen and Bruce Gooch. This paper applies techniques from the fields of Non-Photorealistic Rendering and Image Processing for the purpose of realtime video and image abstraction and cartoon stylization. You can check out some more information on this here .

Ever since I took up this subject and started reading up on it, I have been pretty fascinated by the area of NPR and the way advances in GPUs have really led to better and efficient techniques of NPR. Hence I was always going to pick up on a topic related to this field for the subject of my final project here.

The choice of this name for the project came as a natural extension to what the final product promises to be... a tool which can be used for creating cartoon style videos from normal video files and even cartoon style images for normal image input.

The design doc has been created. It will be fairly important to lay a good foundation while starting off with the coding, so that the final packaged product looks good.

I am planning to implement this in OpenGL and Cg, because of its ease of use and flexibility and efficiency. For the UI, I am planning to use Qt from Trolltech, though I haven't used it before. I feel this will be a good time to learn this API, starting off with a simplistic interface for my product.

Signing off and wishing the very best of luck to all the other groups and individuals in their implementations too!!