“Virtue is the golden mean between two extremes.” -Aristotle
It’s not logical to develop an individual end-to-end browser test for each user case. “Then how many tests should our team produce,” you ask? What’s the “optimal” amount? The answer isn’t an easy one, and there are several factors at play when deciding what number of E2E tests is just right.
Too Many or Too Few?
We’ve seen teams that have less than a dozen browser test cases, and teams that have 1,500. Being at either end of this spectrum can bring challenges. While having too few tests might lead to missing real bugs and releasing a sub-par product, having too many can strain employees and resources, not just through maintaining the tests, but monitoring them as well. Returning too many false-positives causes fatigue and decreases the credibility of your test suite. We’ve discussed previously the concept of optimizing tests, and similar principles apply here.
First off, it’s important to be realistic about the complexity of your application’s UI—it’s extremely rare that an application would require 1,500 end-to-end tests because it’s unlikely that there are 1,500 individual ways that end users are interacting with it. If you tracked how users navigate your application, it would be more likely to find less than a tenth of that: 60 or so core user stories that occur frequently, with about half of those being edge cases that occur rarely. Even for very complex applications, we very rarely see more than 100 use cases that more than 0.5% of users traverse. It’s typically much fewer.
Realistically, a company that has 1,500 end-to-end tests for its web application would likely be better off only running a few hundred. Not only does reducing your number of tests save money and manpower, but it also speeds up your testing and production cycle. Your teams can work on improving the features of the application rather than chasing down non-issues.
On the other hand, while there is such a thing as too few tests, it’s important to ensure that most of your well-traversed user stories are addressed in your testing suite. Otherwise, app-breaking bugs will make it into production, resulting in grumpy users and an unhappy and fatigued internal team.
Picking a Number That’s Just Right
As we’ve discussed previously, the best way to decide how many end-to-end browser tests to perform is to determine how many different ways users actually interact with your application.
For many of us, our first instinct when approaching any problem is to seek out more data. In testing, acquiring data about how users routinely and realistically interact with an application is the first step to actually choosing the right test cases. After that, it’s up to you to decide how many of the user stories that actually occur provide enough value to your business to routinely test.
If you were to graph the distribution of cumulative observed user behavior with a histogram, it would look much like the graph above: a steep curve of early behaviors, and then a bend towards a steep asymptote. After about 60-70% of total observed user behavior, the incremental coverage of each additional test case becomes negligible. From our own research, we find that this long tail of behavior doesn’t typically represent uncommon feature usage–most of it is behavior that doesn’t align with features at all. It can be ignored.
In the end, while your biggest obligation is to provide a quality product to your customer, your second obligation should be to do so quickly and cost-effectively. There exists a “just right” space of testing what matters, and not testing what doesn’t. Data is your guide to finding this golden ratio. If you can identify the core use cases in your application, you are no longer picking between a false choice off “high coverage” and good runtime / high stability: the notion of trade-off ends and you’re getting the best of both worlds.