Screenshot testing with Rust

Intro

This is a follow on from my previous blog post about capturing screenshots with Rust + OpenGL. While the previous article most revolved around the process of actually capturing the rendered pixel data for my OpenGL application, this time around I will focus on the process of actually validating the screenshots.

Setup

First of all, many of these operations can fail, and it can be helpful to have a single error type for these cases, so the following error enum will be used throughout these examples:

enum ScreenshotError {
    LoadIoError,
    SaveIoError,
    EncodingError,
    NoReferenceScreenshot(DynamicImage),
    ScreenshotMismatch(DynamicImage, DynamicImage)
}

Storing the screenshots

The first part of the task was getting reference screenshots to compare the pictures captured from the rendered application against. While I could for each test use my OS's built-in screenshot tools to capture the game in its desired state, I decided to automate the approach a little. I would have the application write out the captured image in the event of a missing reference screenshot or failed test.

From last time, we had an instance of a image::DynamicImage enum. Fortunately, the image crate supports writing data as a PNG file via image.write_to(file, image_format). Since PNG is lossless, we should be able to load it back up and compare it pixel for pixel against a future rendered image.

Conceptually our image saving function is very simple:

use std::fs::File;

use image::{DynamicImage, ImageFormat};

fn write_image<P: AsRef<Path>>(&self, filename: P, img: &DynamicImage) -> Result<(), ScreenshotError> {
    let mut file = File::create(&filename).or(Err(ScreenshotError::SaveIoError))?;
    img.write_to(&mut file, ImageFormat::PNG).or(Err(ScreenshotError::Encoding))?
}

Loading a reference image

Loading a reference image is about as few lines of code:

fn load_reference(&self, path: &str) -> Result<DynamicImage, ScreenshotError> {
    image::open(path).or(Err(ScreenshotError::LoadIoError))
}

Comparing the images

There's no implementation of the equals trait for DynamicImage directly, so to compare the images, we need to use DynamicImage::raw_pixels() to get a Vec<u8> of pixel data (with each u8 being a single color component of the pixel) and comparing those. It's worth noting that this doesn't do any conversion of the underlying image data, so is only valid if the two images are in the same pixel format. Luckily, as both images come from us, we can make that assumption.

fn compare_screenshot_images(reference_image: DynamicImage, actual_image: DynamicImage) -> Result<(), ScreenshotError> {
    if reference_image.raw_pixels() == actual_image.raw_pixels() { 
        Ok(()) 
    } else { 
        Err(ScreenshotError::ScreenshotMismatch(actual_image, reference_image))
    }
}

Of course, this tells us that the image doesn't match, and this might be very useful if there's a big obvious error like an item that didn't get rendered, but some errors can be subtler. An example might be a shader that produces incorrect recolors of a unit. It would help if we could have an image that pointed out which pixels are different between the two images. To do that, we create a new image buffer using ImageBuffer::from_fn(). This method allows us to provide a function that gets called with the x and y co-ordinates of each pixel in the newly created buffer. We then use these x and y co-ordinates to lookup the corresponding pixel in the original source image. If these pixels don't match, we copy the pixels from the screenshot taken during the test run. If they do match, we include a transparent pixel in the output so it can be ignored.

pub fn diff_images(actual: &DynamicImage, expected: &DynamicImage) -> DynamicImage {
    DynamicImage::ImageRgba8(ImageBuffer::from_fn(
        actual.width(),
        actual.height(),
        |x, y| {
            let actual_pixel = actual.get_pixel(x, y);
            let expected_pixel = if expected.in_bounds(x, y) {
                expected.get_pixel(x, y)
            } else {               
                Rgba {
                    data: [0, 0, 0, 0]
                }
            };
            if actual_pixel == expected_pixel {
                Rgba {
                    data: [0, 0, 0, 0]
                }
            } else {
                actual_pixel
            }
        }
    ))
}

Error handling

We have a couple of errors that can occur during the test. Most of them should be just returned as an error to the caller (e.g. failures writing test output). However, there's two cases of errors we want to have special handling:

If the expected screenshot and actual screenshot don't match, we write out the expected screenshot, actual screenshot and diff image to our output directory, and return an Err.
If there is no expected screenshot, then we write out the actual screenshot to the test directory and fail the test. If the actual screenshot is the desired output, it can then be saved as a reference for future use.

use std::fs;
use std::path::{Path,PathBuf};

fn handle_screenshot_error<P: AsRef<Path>>(output_path: P, screenshot_error: ScreenshotError) -> Result<(), ScreenshotError> {
    fs::create_dir_all(output_path).or(Err(ScreenshotError::SaveIoError))?;
    match screenshot_error {
        ScreenshotError::NoReferenceScreenshot(ref img) => {
            write_image(output_path.join("actual.png"), &img)?;
        },
        ScreenshotError::ScreenshotMismatch(ref actual, ref expected) => {
            write_image(output_path.join("actual.png"), &actual)?;
            write_image(output_path.join("expected.png"), &expected)?;
            write_image(output_path.join("diff.png"), &diff_images(&actual, &expected))?;
        },
        _ => {}
    }
    Err(screenshot_error)
}

Putting it all together

So now we have all the pieces we need to perform the test.

First, we capture the screenshot from the application under test, as detailed in the last blog post.
Then we load up a reference screenshot.
We compare the reference screenshot and the fresh screenshot to see if they match.
If there is an error, we run our function we just declared to produce image outputs.
Otherwise, we return an Ok() to indicate that the correct screenshot was produced.

pub fn screenshot_test<P: AsRef<Path>>(output_path: P, x: i32, y: i32, width: u32, height: u32) -> Result<(), ScreenshotError> {
    capture_image(x, y, width, height) // From the previous blog post
        .and_then(|captured_image| {
            match screenshot_io.load_reference() {
                Ok(reference_image) => Ok((reference_image, captured_image)),
                Err(_) => Err(ScreenshotError::NoReferenceScreenshot(captured_image))
            }
        })
        .and_then(|images| {
            let (reference_image, captured_image) = images;
            compare_screenshot_images(reference_image, captured_image)
        })
        .or_else(|err| handle_screenshot_error(screenshot_io, err))
        .and(Ok(()))
}

Packaging it all up

I've created a crate called xray with the approach described in the last two blog posts. This wraps up all the logic that I've just described, along with making it possible to use to test orchestration while providing your own implementation of grabbing a screenshot, if you prefer to use e.g. Vulkan, Win32 or X11 APIs to obtain a screenshot. An example test using xray is below:

#[test]
fn check_basic_screen() {
    let size = [1280, 720];
    let mut app = App::new(size, build_glutin_window(size));
    let Size { width: draw_width, height: draw_height } = app.window.draw_size();
    let Size { width, height } = app.window.size();
    app.render_into_viewport(Viewport {
        rect: [0, 0, draw_width as i32, draw_height as i32],
        window_size: [width, height],
        draw_size: [draw_width, draw_height]
    });
    xray::gl_screenshot_test("basic_rendering/initial_map", 0, 0, draw_width, draw_height);
}

Check out the repo for more information!

Drawbacks

However....

While the above approach works, I had some issues when using it in an actual project that are worth mentioning:

The output that my application at least produced was not entirely consistent across GPUs. While I could take my screenshots rendered on my Linux desktop and a GTX 1080 and compare them to those on my Windows laptop and a GTX 1050, if I used my laptop's integrated graphics, an Intel HD 530, then the image wouldn't match. Not in the sense of broken output, but rather where a texture wasn't just blitted to the screen, the Intel GPU output might different by 2-3 on the RGB scale for certain pixel components. This means while the approach might be good to run on a CI host, it's not particularly useful for looking testing across multiple machines.
While the method here is portable across operating systems, this blog post and the xray crate only provide a screenshot capture method for OpenGL. If you use DX11 or Vulkan, or even just want to allow your app to run across multiple via your game engine's abstraction, you'll likely want to find another method to obtain the rendered pixel data.