Why Test Suites Fail—And How to Avoid Breakdowns

End-to-end (E2E) software testing suites, such as those written in Selenium, frequently fall derelict or fail outright. And while stories of E2E test suite failures are legion, they can all be categorized into two distinct groups:

1. Your test suite has become so large, unwieldy, and unstable that nobody takes the feedback it produces into consideration anymore. It fails every time it runs, throwing dozens of false positives, and its runtime has ballooned so much that critical updates have already moved on without your team investigating all of the failures. When software needs to be rushed out the door to meet release dates, people start thinking that paying attention to the test suite is optional—and in the world of web software, you’re always in a rush.

2. Your test suite isn’t updated when the team makes changes to the application because the engineering resources needed to maintain it have been directed elsewhere. As a result, the suite itself becomes outdated.

If your test suite is outdated, it’s going to fail to catch the bugs that are most important to your users because of how E2E testing works. E2E tests walk through an application and attempt to simulate what users are doing. As developers update the application, users’ pathways through the application change.

The old pathways in the original test suite, even if rudimentarily maintained, won’t represent the event orders or state changes that reflect what users care about anymore. Over time, you’ll find bugs in features that you’re covering with E2E tests, because users are interacting with the application in different ways than you originally imagined.

Test suite breakdowns are common, and those who keep their test suites well maintained often succeed by throwing resources at the problem. But you can set yourself up for a more effective and resource-efficient test maintenance regime by creating a better test suite design at the very beginning. Here’s how.

Build a well-designed test suite

A test suite that functions optimally requires fierce prioritization. You probably don’t have an infinite budget for a QA automation team. (Or, if you do, you are going to lose in the marketplace because you’re spending your whole budget on QA automation rather than on development). E2E test suite maintenance can be expensive and time-consuming, and can require a substantial amount of manual work. One of the best ways to make it easier to maintain your E2E test suites is to prioritize what you actually use E2E testing for.

Too many organizations insist on running countless numbers of unnecessary E2E tests. I’ve seen hundreds, in some cases thousands, of E2E tests built for a single application (one that’s not even the size of something like SAP or Salesforce).

In these situations, I look at the test suite and ask, “Why isn’t this a lower-level test?” The response is often, “Well, Product told us that we have to write these as E2E tests.” The engineering leader knows they’re doing the wrong thing, but they’re not allowed to stop doing it. Product should absolutely help guide testing; among departments, it probably has the best insight into what features users care about. But product managers aren’t testing experts, and they simply don’t know what level of testing is appropriate for which kind of test.

Google has published insightful content on this concept. It suggests a pyramid structure, where you have a limited number of E2E tests, a fair number of integration tests, and a large number of lower-level unit tests. Lower-level tests are cheaper to build, are easier to maintain, and produce higher-quality feedback.

It doesn’t make sense to build an E2E test when a lower-level test will suffice. E2E tests are mission-critical: You simply don’t know how the application or feature set works as a whole until you’ve tested it end to end. But most tests should make sure specific components work, and so they should be lower-level tests. When you operate with a philosophy of “always use lower-level testing when possible,” you will be able to build a lean, efficient, and up-to-date test suite.

Rethink maintenance as three processes

I’ve also seen test suite maintenance fail because all update requests were chucked into a JIRA ticket list without any prioritization or organization. In such a system, engineers typically work on the easiest things first, rather than what’s most important. The team as a whole can fall behind, and it may skip more and more tests, including those awaiting repair, over time.

Besides matching test suite size with resources, you can break up test maintenance into three types: build-to-build, stability, and over time. Categorizing maintenance into these buckets allows you to streamline the work for each category by having different processes, prioritization rules, and (most importantly) specialized engineers for each one.

Build-to-build

Use build-to-build maintenance when developers need to make changes to the user interface (UI). If anything substantial about the UI changes, or if the workflow being tested changes, the test will fail because it won’t be able to find the element it’s trying to interact with. Luckily, UI changes are rare by nature—if you’re constantly making changes to your UI, your users will keep having to relearn how to use your app, become annoyed, and possibly stop using it.

When UI changes do come down the pipe, you need to have a feedback loop between your developers and your QA team that communicates upcoming UI changes. This feedback should occur after design and before development. Your QA team can now anticipate these changes and respond with updated tests ready to ship before the UI change has been built. Build-to-build maintenance is done well when your QA team is aware of upcoming UI changes and can prepare for them.

Stability

The key to maintaining stability in a test suite is to not wait until the test fails after deployment. Instead, run your tests repeatedly against the same version of the application. Look for any inconsistencies in passing/failing tests, and you’ll find instability.

To prevent chaos in your deployment, set up an alternate continuous integration (CI) pipeline, and run these tests repeatedly against the test environment—not as part of the deployment pipeline, but as part of an alternate pipeline that’s specifically for the QA team to monitor. If your tests are routinely passing but some suddenly fail, you’re now aware of instability in the test and can create a ticket to stabilize it.

Fixing these unstable tests must be a priority, or they will degrade. The number of tests that you skip because of their instability will grow until you have so many that the entire test suite becomes unmanageable. For any test that’s worth keeping, stabilizing it should take precedence over building new tests. Otherwise, a few months down the road you’ll find yourself with a test suite that mostly gets skipped.

Maintenance over time

Maintenance over time is about making sure that your test suite is always aligned with top-priority workflows in your application.

One of the biggest causes of long-term test suite degradation arises when you constantly add on new tests every time development releases new features or UI changes. It’s true that if you’re expanding your feature set, you will need to expand your test suite to some extent. But here is a principle that is universally true that will help guide you in reprioritizing testing:

There’s only so many things your users are going to be able to do on your app.

Your application is only a tiny part of a user’s life, and they’re only going to learn so much about it. There are only so many features they’re actually going to use, and they’re only going to invest the time to learn to do a certain number of things. So adding more features does not mean that your users are using more features; they’re reprioritizing what features they’re using. Therefore, you need to reprioritize what you’re testing, rather than simply adding more tests.

If you just keep adding tests, your QA team will need to expand—even though your developer team stays the same size—and each additional test will provide ever more marginal value. Continuously adding tests as you alter your application increases your test maintenance burden without meaningfully improving quality assurance, so the only way to prevent test suites from falling apart in this situation is to constantly grow your QA team over time, which nobody wants to do or has the budget to do.

The solution? Know which tests to change.

Production analytics

If you remain attached to older tests that you wrote, and every test remains a “top-priority” test that has to stay in the test suite, your test suite maintenance burden will become untenable. Instability and runtimes will increase until the test suite falls apart. When test suites take too long to run, people stop running them often enough to assure quality at any meaningful development speed. If the test suite is unstable and fails, any engineering team that’s in a hurry will just ship to production instead of spending eight hours analyzing all of those failed tests.

The best way to avoid this all-too-common scenario is with production analytics. By using live user behavioral data from your application, you can reprioritize and reorient your test suite to what your users truly care about. Production analytics tells you precisely what to test based on how your users actually use your application.

Get fierce about enforcing constraints

Ultimately, it takes a fierce leader to put constraints around runtime or test suite sizes that QA teams need to work within. But doing so will force your teams to use that data to prioritize testing efficiently, rather than just making new tests. The lazy thing to do—and unfortunately this is what most organizations do—is to just add those tests. And in the absence of strong, data-driven leadership, they will continue to do so.