Our approach to automated visual regression testing
Recently our team started a new project, and so we decided to experiment with a number of new approaches and techniques, to see if any of these could help improving our day to day development efforts as well as increasing the overall quality of the product. One of these experiments turned out to be particularly beneficial for both aspects, and it’s automated testing for visual regressions. Here we want to share our approach, and how we’ve built a testing stack that allows us to ship with confidence and have early detection of visual defects which greatly reduces debugging and fixing efforts.
At the center of this automated testing layer lays the choice of a css regression tool, and for this we’ve seen a large number of options, some working exclusively on the front-end of the application, some more advanced allowing to orchestrate the whole stack, hence allowing for more complex scenarios. In short, what is common to most of these tools is the simple concept of generating a set of baseline screenshots which document how the pages (or parts of them) should look like, and every time new code is checked in, those pages are tested against their respective baseline.The tool we’ve found most fitting to our situation is called Needle, a python tool based on Nose and Selenium
Can you tell the difference between this image
And this one?
The interesting thing about Needle is that since visual tests are regular nose tests, they can easily be integrated in your testing flow and Continuous Integration system, besides the other benefits of running nose tests (e.g. parallelization, coverage, test grouping etc.). The other advantage is that, using Selenium, the most part of modern browsers can be tested, including mobile ones, which for us was an important requirement since one of our project’s earliest requirements was responsiveness.What for other, front-end based alternatives, is relatively difficult, is setting up the test and preparing the data (an possibly the preliminary interactions) needed before asserting the status of a page. Needle in this respect posed no issue, since our project is based on Django and we could use all of our existing code already in place for other kinds of tests (e.g. fixtures, factories etc.).
Needle uses Pillow under the hood to generate screenshots of the tested elements, but it can go one step further and can be configured to use perceptualdiff to generate diff files of the visual differences encountered during tests. This means that when a test fails, you can see right away exactly which pixels differ.
This is the output that perceptualdiff generates of the two images above
For testing the responsiveness of the pages, we decided to use Chrome’s emulation capabilities which are exposed to Selenium by chromedriver. Chrome’s emulator allows to specify the characteristics of the viewport either manually by specifying the size of the viewport and its pixel ratio or using a well known device profile, e.g. ‘iPhone 6’ or ‘iPad 2’. Unfortunately, though, at the time of this writing, the orientation cannot be controlled by chromedriver, so we ended up using manual specifications for certain cases, but other than that this proved very handy for our needs.
One challenge we had at the beginning has been the problem of how different Operating Systems influence the way browsers render specific elements (e.g. fonts). For example, we use Mac machines for development whereas our Continuous Integration VMs are running a mix of Debian and Ubuntu machines. This represented a problem for visual tests in that the baseline screenshots generated on the Mac have subtle, but still meaningful, differences compared to the ones generated on, say, Ubuntu, therefore many tests were failing because of this. The solution we decided to adopt is based on a Selenium Grid. A minimal Selenium Grid setup consists of one machine running both as hub and node, with all the necessary browsers installed, and you can point your visual tests to this machine and delegate the browser operations to it. For us that meant having the possibility of using the same exact browser and OS (in fact the same machine) both for generating baselines and for testing against them, reducing machine-related uncertainty to zero. Now, when a test fails we know it’s not because of differences in the machines involved, but is because there was an actual visual regression.
The problem of using a browser on another machine for needle tests is that such machine has to have access on the way back to the test machine, which is serving the pages under test.This is not a problem if both machines are within the same LAN or VPN (a simple DNS based solution will do) but in our case we didn’t have this possibility, so we ended up creating a reverse ssh tunnel from the selenium grid to the development machine.
Another issue we’ve been facing was a bug in Chrome that prevents taking screenshots of the full page, including parts outside of the scrolling area. Luckily we’ve found a pretty good solution here *. This is basically scrolling the content and taking different screenshots, which are then stitched together using Pillow, the end result being a screenshot of the full page. With the ability to capture anything from the full page to single buttons and use them as baselines, our visual tests got quite powerful in a short time, and helped us catch many unintended modifications, which otherwise would have been very difficult to spot.
As you probably realized by now, there are a lot of elements and issues involved in setting up a visual regression testing stack that can be really be beneficial to your team and your project, but our experience was all in all very positive, and for us the effort definitely paid off. Of course the earlier this kind of testing is integrated in the project’s life cycle, the better.If you have experience with this technologies or you faced similar issues and came up with nice solutions, we’ll be happy to hear from you!
* the original link (https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/) seems broken at the moment
- Django-treebeard and Wagtail page creation
- The Charity Sport Tournament in Lublin
- New Release of Agilo for Trac (0.9.15/1.3.15)
- Incontro DevOps Italia 2016
- Configuring Test Kitchen output for Jenkins
- Configuring Test Kitchen on Jenkins
- Better infrastructure management a.k.a. IAC (Infrastructure as Code)
- Our approach to automated visual regression testing
- Test parallelization with Lettuce, take 2
- New Release of Agilo for Trac (0.9.14/1.3.14)