CS 180 Project 4: (Auto)Stiching Photo Mosaics

Overview

The first part of the project explores how to create image mosaicing by taking multiple photographs and projective warping, resampling, and compositioning them. I computed homographies to change the perspective of the images with the help of solving linear systems and inverting matrices. Finally, I used linear interpolation to fill in the warped image and then blended them together to create a seamless mosaic.

Shooting and Digitizing Photographs

First, I needed photos in order to create the rectifications and mosaic stiching. I took pairs of pictures that had the same center of projection. In order to create a smooth projective, I rotated my camera while capturing the photos. Here are the original images:

Macbook (Original)	Restaurant Sign (Original)
Sink Left	Sink Right
Outside Left	Outside Right
Campanile Left	Campanile Right

Homography Calculations and Formula Derivation

A homography is a projective mapping between any two images that have the same center of projection. I first created a function that I could select pairs of corresponding points between the two image. This way we can use them to recover the homography to warp one of the images onto the other. Let's create a homography matrix that maps points from an image to another image. Define $(x, y)$ as the source point and $(x', y')$ as the destination point. However, because of noise it is often infeasible to find $H$ where it exactly transform the points. Instead, we can find the best fit homography matrix that approximates the transformation. Here are the sets of equations that map the source point to the destination point using the homography matrix: $$ p' = H \cdot p \\ $$ $$ \begin{bmatrix} wx' \\ wy' \\ w \end{bmatrix} = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} $$ Then, expand out the matrix multiplication: $$ \begin{align*} wx' & = ax + by + c \\ wy' & = dx + ey + f \\ w & = gx + hy + 1 \end{align*} $$ Since there is an equation for $w$, substitute it into the other two equations: $$ \begin{align*} ax + by + c &= (gx + hy + 1)x' \\ dx + ey + f &= (gx + hy + 1)y' \end{align*} $$ Finally, we have the new matrix form: $$ \begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -x'x & -x'y \\ 0 & 0 & 0 & x & y & 1 & -y'x & -y'y \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x' \\ y' \end{bmatrix} $$ After I created the $A$ matrix and $b$ vector of the other image's points, I used np.linalg.lstq to solve for the homography matrix. This is because if there are more than 4 corresponding points, the system is overdetermined so we must use the least squares method to solve for the homography matrix that approximates the best fit.

Rectification

Rectification is the process of converting an object that is rectangular in nature but looks different at another angle back into a rectangle. First, I specified the locations of the corresponding rectangle that I wanted to warp my image into. Then I calculated the homography matrix between the corresponding points on the image and these locations. Finally, I created another warp_image function which used the inverse homography matrix to find the original points to fill in the warped image and linearly interpolated between the pixels.

For the warped MacBook, I used a corresponding rectangle located at [[0, 0], [600, 0], [0, 400], [600, 400]] and for the warped restaurant sign I used a corresponding rectangle located at [[0, 0], [150, 0], [0, 420], [150, 420]]. Here are the images:

MacBook (Original)	MacBook (Rectified)
Restaurant Sign (Original)	Restaurant Sign (Rectified)

Mosaic Blending

Mosaic blending is the process of blending two images together to create a seamless transition between the two. I defined multiple corresponding points that appeared in both the left and right images. Using these points, I calculated the homography matrix to warp the left image into the right image's perspective. I used the previous functions talked about above for this warping. Then, I created another function get_warped_corners which transformed the warped image's original bounding box into the new warped bounding box. This way I can create an entire mosaic big enough to fit both images. Using the top left corner, I found the unique translation to transform it to (0, 0). This meant that I could find the displacement to place the right image at the correct location. Finally, I used simple feathering (weighted averaging) at every pixel where the center of the unwarped image would be 1 and it would fall of linearly until it hit 0 at the edges. This way combining the warped and unwarped images would blend together seamlessly. Here are the results:

Sink Left	Sink Right	Sink Left (Warped)	Sink Mosaic
Outside Left	Outside Right	Outside Left (Warped)	Outside Mosaic
Campanile Left	Campanile Right	Campanile Left (Warped)	Campanile Mosaic

Learnings

The coolest thing I learned from this project was about homographies and the math behind them in order to recover and warp the points from one image into the perspective of another. I also learned how to debug by showing the intermediate images after each step.

Overview

The second part of the project will automatically stitch together images into a mosaic by finding and matching feature descriptors. I used Harris corner detection to find the corners of the images and then used Adaptive Non-Maximal Suppression to get a better spatial distribution of the points. Finally, I used nearest neighbor to match up the image patches and then used RANSAC to find the homography matrix that best fits the points. I then used the homography matrix to warp the images and blend them together to create a seamless mosaic.

Harris Interest Point Detector

I used the get_harris_corners starter code to get the Harris points and identify corners within an image. This will be helpful in finding the feature points that we can use to match up the images. The Harris corner detection algorithm calculates the intensity gradients at a certain location which represent the rate of change in pixel intensities along the $x$ and $y$ directions and thus the edges and corners that have sharp intensity changes. I also filtered out the corners by setting a threshold on the corner response value and overlaid them on the images. Here are the results:

Sink Left (Original)	Sink Left (Harris)	Sink Right (Original)	Sink Right (Harris)
Outside Left (Original)	Outside Left (Harris)	Outside Right (Original)	Outside Right (Harris)
Campanile Left (Original)	Campanile Left (Harris)	Campanile Right (Original)	Campanile Right (Harris)

ANMS Algorithm

As shown above, there were a lot of different corner points. I wanted to distribute the points more evenly around the images. First, I found the distances between each pair of points from the two different images. Then, I sorted by the corner response value. For each point in the first image, I calculated the suppression radius using the following formula: $$ r_i = \min_j |x_i - x_j|, \text{ s.t. } f(x_i) < c_{\text{robust}} \cdot f(x_j)$$ Finally, I sorted the points by their suppression radius and selected the $n$ points with the largest radius. This way I could get a better spatial distribution of the points. Here are the results:

Sink Left (ANMS)	Sink Right (ANMS)
Outside Left (ANMS)	Outside Right (ANMS)
Campanile Left (ANMS)	Campanile Right (ANMS)

Extracting Feature Descriptors

For each of the ANMS points from before, I took a 40 by 40 patch centered at the point and then downsampled into 8 by 8 mini-patch. Afterwards, I normalized the mini-patch (0 mean and unit variance) and then flattened into a vector. Here are the results:

40 x 40 Patch	8 x 8 Mini-Patch
40 x 40 Patch	8 x 8 Mini-Patch

Matching Feature Descriptors

With the flattened feature descriptors, I would be able to find the best matches between the images. I followed along with the algorithm in this research paper. First, I used the provided dist2 function to calculate the SSD between each pair of points from image 1 and image 2. Then, I found the Lowe's score for each point in image 1 which is the ratio of 1-NN distance to 2-NN distance. If the Lowe's score we below a certain a threshold, I added the point and its 1-NN to the matches. Finally, I plotted each point in image 1 with its corresponding 1-NN in image 2. Here are the results:

Sink Matching Feature Descriptors

Outside Matching Feature Descriptors

Campanile Matching Feature Descriptors

RANSAC

As shown above in the matching feature descriptors, there were some points that were not matched correctly. I used the RANSAC algorithm to further improve the number of matches and find the best ones to compute the homography matrix. These are the following steps I took:

Randomly select 4 pairs of points from the matches.
Compute the homography matrix using the 4 pairs of points.
Warp the points from image 1 to image 2 using the homography matrix.
Calculate the SSD between the warped points and the points in image 2.
Count the number of inliers where the SSD is below a certain threshold.
Repeat steps 1-5 for a certain number of iterations.
Recompute the approximate homography using the set of the largest inliers.

I plotted the selected inliers onto their respective images and the warped left image into the right image's perspective. Here are the results:

Sink Left Subset	Sink Right Subset	Sink Left (Warped)
Outside Left Subset	Outside Right Subset	Outside Left (Warped)
Campanile Left Subset	Campanile Right Subset	Campanile Left (Warped)

Mosaic Comparison

Using the above 4-point RANSAC algorithm, I auto-stitched together the images to create a seamless mosaic without predefining the correspondences. I also compared them to Part A's mosaics where I manually created the correspondences. As shown below, the auto-stitched mosaics found better homographies as there are fewer artifacts and blurriness compared to the manual defined mosaics. Here are the results:

Sink Mosaic (Predefined Points)	Sink Mosaic (Auto-Stitched Points)
Outside Mosaic (Predefined Points)	Outside Mosaic (Auto-Stitched Points)
Campanile Mosaic (Predefined Points)	Campanile Mosaic (Auto-Stitched Points)

Learnings

I learned a lot about finetuning and filtering out the Harris corners especially to get a better spatial distribution on the images. Throughout this process, I made sure to note down what order my points in whether as (row, col) or (y, x). This way when I'm plotting and calculating the homography matrix I would get the correct results. The coolest takeaway was finding the best matches to automatically stitch together the images.

CS 180: Intro to Computer Vision and Computational Photography, Fall 2024

Project 4A: Image Warping and Mosaicing

Ian Dong

Overview

Section I: Shooting the Photos

Shooting and Digitizing Photographs

Section II: Recovering the Homographies

Homography Calculations and Formula Derivation

Section III: Rectifying the Photos

Rectification

Section IV: Mosaic Image Blending

Mosaic Blending

Section V: Conclusion

Learnings

Project 4B: Feature Matching for Auto-Stitching

Overview

Section I: Detecting Corner Features in an Image

Harris Interest Point Detector

Section II: Adaptive Non-Maximal Suppression

ANMS Algorithm

Section III: Extracting Feature Descriptors

Extracting Feature Descriptors

Section IV: Matching Feature Descriptors

Matching Feature Descriptors

Section V: Running RANSAC

RANSAC

Section VI: Producing the Mosaic and Comparisons

Mosaic Comparison

Section VII: Conclusion

Learnings