‘Why Didn’t This Get Tested!?’ End-to-End Testing with Live User Data


Insufficient or inconsistent quality assurance test coverage leads to bugs in production, which negatively impact cost, revenue, customer satisfaction and, ultimately, brand value. Bugs in production can cost up to seven times more to resolve than they would have in the testing stage.

However, because end-to-end (E2E) testing is expensive in terms of both money and runtime, QA teams must prioritize what gets tested. Typically, this consists of QA teams getting together in a room, using their knowledge and experience to essentially guess what should be tested, devising the tests, and then using a toolset to build and maintain the test automation.

Unfortunately, these tests are subjective, based on the QA teams’ best-educated guesses. They’re stuck trying to predict what real-life user behavior will be without actual data, and they hope their tests reflect how users behave. Sometimes this might hit the mark, but often it does not.

This gap between what is predicted and what users actually do means that test coverage models will fail to cover certain workflows key to the user experience, and bugs can appear in production that missed the screen of tests. For example, an analytics application stops displaying data when a user modifies the data manually; a sensors application creates false alerts when a user builds the alert before hooking up the device; a CRM won’t let you add a customer after previously adding 14 of them; an ATS won’t let you post a job if it’s associated with a certain team. A bug could also disrupt checkout processes, preventing users from converting, buying a product, or otherwise paying money for goods or services.

Failures like these could mean huge losses in revenue or retention for every minute of downtime, so when they do make it into production, the entire organization goes into panic mode. Engineers drop everything they’re doing. They literally go into a “war room” to collectively hunt down the bug and fix it, and then ship a “hotfix” as soon as possible, hoping the fix doesn’t cause something else to malfunction. Development grinds to a halt so that there are as few changing variables as possible.

Subsequently, QA teams must deal with the post-mortem “blame-storm.” Whose fault was it, and why? How many times have you heard, Why didn’t this get tested? A cycle of blame-storming can lead to a toxic environment of finger-pointing and passing the buck, and if it happens often enough, the engineering team cannot cultivate a culture of excellence. Highly talented engineers leave for better organizations, while less talented people take their place. Bug frequency increases, perpetuating a snowball effect.

On top of all this, engineering teams often can’t update these large, unwieldy tests at the same pace as features change, leading to flaky, unmaintained test suites. Combined with gaps in coverage from best-guess test building, these test suites fail so often that they are no longer paid attention to and provide zero value (this “boy-who-cried-wolf” phenomenon probably sounds familiar to many engineering teams).

Stay out of the War Room

At the end of the day, none of this is the fault of your QA team. Just try to imagine a brute-forced random walk through your application—you’re looking at hundreds of thousands or even millions of unique pathways. QA teams must prioritize, and they typically rely on input from product and engineering teams to do that. Often, when a costly bug does occur, it’s because users are using the application in a way that engineering and product teams didn’t expect; for example, they took a different path to get to checkout, or they filtered products by a set of filters in a way that wasn’t tested (because there are 10,000 possible filter combinations). QA teams have no data informing what tests they should be building.

But what if your users are already telling you precisely how to test your application?

Your QA team can give itself a massive advantage by bringing this data to bear, analyzing real traffic patterns and user behavior on your app, then using that analysis to prioritize and build tests. With production analytics data telling you precisely what to test based on actual user behavior, you’ll catch more bugs that actually matter to your users, and you’ll catch them much sooner. Harnessing data to guide your testing priorities allows you to build a test suite that better protects the quality of your application, without creating unnecessary tests that clog up JIRA boards and deployment pipelines.

The implications ripple through your entire engineering organization. You’ll build and maintain fewer low-priority tests, freeing up your most valuable talent to focus on product rather than test maintenance. You’ll improve the stability and runtime of your E2E test suite, allowing it to run more frequently and provide higher-fidelity feedback to developers. You’ll prevent the bugs that matter to users from ever making it into production.

Some bugs will indeed reach production, even when using analytics to guide your test strategy. But when these bugs get out, you know the impact is low. You know they affect few users and don’t affect core workflows in your application. You now operate in a world where you can move fast and break only the unimportant stuff.

And the next time someone storms in and asks, “Why wasn’t this tested!?” you’ll have an answer: “The data told us it wasn’t a priority.” You’ll be able to demonstrate that the bug that made it to production had minimal impact, and show off the cascading benefits of running a leaner, more data-driven QA process.