CS 180: Intro to Computer Vision and Computational Photography, Fall 2024
Project 1: Images of the Russian Empire
Ian Dong
Overview
Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was a pioneering Russian photographer who, as early as 1907, began using a unique method to capture color images by recording three exposures using red, green, and blue filters. Granted special permission by the Tsar, he traveled across the Russian Empire photographing various subjects, including the only color portrait of Leo Tolstoy. Although color printing technology did not exist at the time, he envisioned these images being projected in classrooms across Russia. This project aims to recreate these images in their true colors by aligning the R and G plates to the B plate.Section I: Simple Single-scale Approach
Simple Single-scale Approach
I first split the image into the three different red, green, and blue image channels. Afterwards, I fixed the blue channel and continuously overlay the red and green plates over it. For these small .jpg images, I exhaustively searched through a 30 by 30 box (15 on each side) to displace these plates and find the best displacement match using the metrics listed below.
Displacement Metrics
-
Sum of Squared Differences (SSD):
- I had first tested the displacement match by calculating the sum of squared differences (SSD) between the two plates. The minimum SSD would indicate the best alignment. This displacement vector measured how close the RGB plates were to each other. It was one of the worst metrics mainly because small differences in lighting, contrast, or exposure between image layers can disproportionately affect the L2 norm, causing it to perform poorly in matching similar regions that only differ in brightness or contrast.
-
Normalized Cross-Correlation (NCC):
- I then tested the normalized cross-correlation (NCC) metric. This metric was more robust to changes in lighting, contrast, or exposure between image layers. The maximum NCC would indicate the best alignment. This displacement metric normalizes the intensity of the images, making it insensitive to differences in brightness or contrast. It was able to perform better on the images compared to SSD.
-
Structural Similarity Index Metric (SSIM):
- Finally, I tested the structural similarity index metric (SSIM). This metric was able to capture the perceived similarity between two images, focusing on structural information rather than just pixel-wise differences. It compared the luminance (brightness), contrast (intensity differences), and structure (spatial differences) and mimics human visual system's way of perceiving image quality. The maximum SSIM would indicate the best alignment. This displacement metric was able to capture the perceived similarity between the images and was able to perform the best out of the three metrics but was about four times as slow.
Section II: Low Resolution JPG Results
Low Resolution JPG Results
Here are the results after applying the image pyramid approach to the low resolution JPG images. The first displacement number represents the change in the rows while the second represents the columns.
cathedral.jpg Red Shift: (12, 3) |
monastery.jpg
Red Shift: (3, 2) |
tobolsk.jpg Red Shift: (6, 3) |
Section III: Image Pyramid Approach
Image Pyramid Approach
Like before, I split the images into its red, green, and blue channels. However, the naive simple single-scale was far too inefficient for the large (.tif) images so I needed to use an image pyramid to speed up this process. I decided to use five levels as it worked the best for the images and scaled down each of the plates to search iteratively within them starting from the smallest. After I had gotten the best displacement using the above SSIM metric, I made sure to add to the running displacement total. I also scaled these displacement values by a factor of two to ensure that the next scaled up image would search in the correct location. Before iteratively searching in the next image, I made sure to roll this plate so that it would already be aligned. Using this image pyramid really helped speed up this process tremendously.
Section IV: High Resolution TIF Results
High Resolution TIF Results
Here are the results after applying the image pyramid approach to the high resolution TIF images. The first displacement number represents the change in the rows while the second represents the columns.
church.tif Red Shift: (58, -4) |
emir.tif
Red Shift: (105, 40) |
harvesters.tif Red Shift: (124, 13) |
icon.tif Red Shift: (89, 23) |
lady.tif Red Shift: (117, 10) |
melons.tif Red Shift: (178, 12) |
onion_church.tif Red Shift: (108, 36) |
sculpture.tif Red Shift: (140, -26) |
self_portrait.tif Red Shift: (176, 36) |
three_generations.tif Red Shift: (112, 9) |
train.tif Red Shift: (87, 31) |
Section V: Prokudin-Gorskii Collection TIF/JPG Results
Prokudin-Gorskii Collection TIF/JPG Results
Here are the results after applying the image pyramid approach to the high resolution Prokudin-Gorskii collection images. The first displacement number represents the change in the rows while the second represents the columns.
flowers.tif Red Shift: (126, 34) |
tree.jpg
Red Shift: (55, 46) |
ocean.jpg Red Shift: (12, -1) |
Section VI: Bells and Whistles
Structural Similarity Index Metric (SSIM)
I was able to implement my own SSIM function to compare the similarities between the reference and shifted images. This metric focuses on structural information rather than just pixel-wise differences. It compares the luminance (brightness), contrast (intensity differences), and structure (spatial differences) and mimics human visual system's way of perceiving image quality.Algorithm
I followed along with Wikipedia's approach in calculating the metrics. First, I calculated the mean and variance of each of the images. Then, I found the covariance between both. I also needed to calculate two variables to stabilize the division with a weak denominator which represented the luminance and contrast constants as well as the dynamic range of the pixels.Effects
Although it did not perform significantly better on some images, this algorithm helped to successfully align Emir. SSIM outperformed the other two metrics as it focused on the brightness and contrast between pixels instead of the raw values directly. However, it was a lot slower as the the algorithm took 156 seconds while NCC only took 27 seconds. Here are the before and after images for Emir:
emir_with_ncc.jpg
Red Shift: (55, 46) |
emir_with_ssim.jpg Red Shift: (107, 17) |
Auto Crop
I was able to implement my own auto crop algorithm. Instead of manually figuring out how much to crop from the sides based on the thickness of the black bars, this algorithm would be able to identify and crop the image plate on its own.Algorithm
In my algorithm, I took in all three image plates and filtered the arrays by a certain threshold. I realized that higher values meant more on the darker side so I found the bounding box of the pixels with values less than the certain threshold. Finally, I cut away all of the pixels outside of this image as those represented the black bars.Effects
This algorithm had a tremendous help in aligning the monastery image. The different channels were not at the correct places and certain parts had hues of red or green. With this algorithm, I was able to fix and align the channels better on top of each other by removing the unnecessary black bars. Here are the before and after images of the monastery.
monastery_no_autocrop.jpg
Red Shift: (9, 1) |
monastery_autocrop.jpg Red Shift: (3, 2) |