What is End-to-End Testing?

“You can test every servo, every wire, every component; the hardware, the firmware, the software. But you have no idea whether your robot is going to work, and whether it’s going to work in the wild, until you let it out into the wild and let it fail. And it will fail, even when it seems like every component works individually. Even well-built things get hairy when you connect them all together.”

We didn’t have a truly applicable analogy for end-to-end testing until we heard it from a customer that had previously built robots before moving over to web software. Sure, there must be some sort of theoretical, Platonic ideal of testing in which exhaustive testing of components and subsystems will guarantee that the robot—or your web application—will work without needing to be run “in the wild.” But, we’ll wager, nobody’s found it yet.

The Advantage of End-to-End Testing

This is the essence of end-to-end (E2E) testing, and why it’s so important. Your web application is probably less complex than a Mars rover, but you similarly won’t know whether it’s going to work once you put all the bits together until you’ve done just that. Your unit tests will test individual blocks of code for their core functionality. API/integration tests will make sure that your “subsystems” are working as intended. But, E2E tests are intended to test the entire application as real users would use it, in conditions similar to how real users will use it.

Therefore, an E2E test will actually launch the application in a browser and interact with it in a way that will test every layer of the application: the user interface itself, the browser (and compatibility with it), the network, the server, the APIs, the codebase, any 3rd party integrations, and any hardware—the whole kit. As with the robot, you don’t really know how all of these components and layers will work together until they’re doing just that—working together. You therefore don’t want to be shipping changes to your application without testing it end-to-end (unless you don’t mind bugs sneaking through.)

E2E tests can assume many names, in part depending on their level of rigor. They can be called browser tests, smoke tests, user acceptance tests (UATs), or (less accurately) UI tests. Typically these all mean the same thing: you launch a browser to interact with the application, and check that specific behaviors still work as intended.

There are two ways to launch this browser and interact with the whole application: the first is with a human who checks for failures by clicking around, referred to as manual testing. The second is by having a machine virtually simulate a human, using predetermined validations to check for failures, referred to as automated testing.

Data-Driven Testing

And as with our Mars rover, it’s ideal to test the application by simulating real-world usage as precisely as possible: testing the application in the same way that your users are using it, or are going to use it. This requires having data which tells you how your users are in fact using your application. Utilizing real user data is always possible when testing for regressions. But, user behavior needs to be estimated (or, frankly, guessed) when testing brand new features because you don’t have data about real usage quite yet.

Some teams might be tempted to do “kitchen sink” testing and try to test the application in every possible way, rather than in a way that reflects user behavior. We discourage this elsewhere in more detail elsewhere, but the primary consideration is that E2E tests are the most expensive, least stable, and slowest tests you’ll run. Having too many is going to incur dramatically increased costs for steeply-diminishing returns.

Limitations of E2E Testing

Finally, a word of caution: E2E testing has limitations. It’s great at testing that the application will generally function: that users can always move through workflows without errors or breakages. It’s great at ensuring that all of the gnarly bits of code are working together when a user checks out or logs in or executes an analysis. But E2E testing isn’t great (or efficient) in testing that the right data is going to be displayed for a user or stored in the application—this is a place for unit tests to thrive. E2E testing also isn’t great at showing you where in your codebase your bug is hiding—just where in the user’s journey they’re going to find that the application is broken. Finally E2E testing isn’t great at telling you whether your page is formatting incorrectly or “looks wrong.” It can do a bit of this, but it’s a heck of an expensive way to do so. We recommend using a tool such as Percy.io for testing visual regressions, instead.

In short: ignore E2E testing at your own peril, but over-relying on it won’t do you any favors, either.

How to Design End-to-End Testing for Continuous Development

“Even if you’re on the right track, you’ll get run over if you just sit there.”

– American Humorist Will Rogers

Continuous development (CD) has become the new gold standard in today’s tech marketplace, and with good reason. Continuous development is fast, efficient, and allows for a company to get the latest version of their product to their customers as quickly as possible. It furthermore allows developers to get rapid feedback on small chunks of code through continuously testing that code as it’s developed, as long as continuous testing is also in place.

But with any change comes new challenges, especially if a development team is transitioning from a longer (for instance, monthly versioning) development cycle to a continuous development cycle. As part of this shift, a team must tackle rearranging end-to-end testing cycles accordingly, moving from monthly testing to (ideally) continuous testing on every build. Every release must be adequately tested, and with continuous development multiplying the number releases, the quantity of required test runs rises in turn. Each new change or release carries the risk of breaking an application, and each new change or release must be tested.

Because of this challenge, many testers turn to automated testing solutions for help. With regular continuous updates, manually testing an app prior to each release is now infeasible (not to mention unbearably expensive). Automation makes testing each deploy possible, and removes the primary testing burden from the development team and allows them to focus instead on other priorities like product improvement. Though automation is relatively simple to implement in unit or API testing, a proper approach to automating end-to-end testing requires further consideration.

Outsourced Testing

If a team’s end-to-end testing is outsourced to a manual testing company (crowd-sourced or otherwise), they are most likely to pay per test. This cost might be bearable for a team shipping and testing code once per month. However, a team deploying five or 10 times per business day is suddenly looking at 20 to 40 times the cost of the team deploying once per month. We discuss some other pros and cons of outsourced/crowdsourced manual testing in this blog post.

Test Code Automation

Instead of employing outsourced manual testing, a development team can alternatively use “lean and mean” testing code to automate their tests. To keep up with a CD release cycle, they need to develop a core regression testing suite that covers important workflows: it needs to run in minutes, rather than hours. This means prioritizing tests to focus on workflows that impact the most users, impact core functionality, and/or impact revenue. We at ProdPerfect have already discussed choosing the correct number of end-to-end tests, and that information is just as relevant here as otherwise.

What About Less Common Flows?

Continuously-testing teams will only be testing core workflows with each build; what about less common flows, edge cases, and the like? These tests should be maintained as a second suite altogether, and run out of band, at timed intervals, on a stable testing/staging environment. This strategy can also be used for cross-browser/device testing, which if extensive will also take more than a few minutes. By splitting your testing suite in two, you get the benefit of continuous testing on continuous delivery, without losing the ability to find edge case bugs.

It may feel like a daunting task to begin a CD cycle for the first time, but with foresight and intelligent testing plans, any team can take full advantage of its benefits to improve their product and make their customers ever happier.

Picking the Optimal Number of End-to-End Tests

“Virtue is the golden mean between two extremes.” -Aristotle

It’s not logical to develop an individual end-to-end browser test for each user case. “Then how many tests should our team produce,” you ask? The answer isn’t an easy one, and there are several factors at play when deciding what number of tests is just right.

Too Many or Too Few?

We’ve seen teams that have less than a dozen browser test cases, and teams that have 1,500. Being at either end of this spectrum can bring challenges. While having too few tests might lead to missing real bugs and releasing a sub-par product, having too many can strain employees and resources, not just through maintaining the tests, but monitoring them as well. Returning too many false-positives causes fatigue and decreases the credibility of your test suite. We’ve discussed previously the concept of optimizing tests, and similar principles apply here.

First off, it’s important to be realistic about the complexity of your application’s UI—it’s extremely rare that an application would require 1,500 end-to-end tests because it’s unlikely that there are 1,500 individual ways that end users are interacting with it. If you tracked how users navigate your application, it would be more likely to find less than a tenth of that: 60 or so core user stories that occur frequently, with about half of those being edge cases that occur rarely. Even for very complex applications, we very rarely see more than 100 use cases that more than 0.5% of users traverse. It’s typically much fewer.

Realistically, a company that has 1,500 end-to-end tests for its web application would likely be better off only running a few hundred. Not only does reducing your number of tests save money and manpower, but it also speeds up your testing and production cycle. Your teams can work on improving the features of the application rather than chasing down non-issues.

On the other hand, while there is such a thing as too few tests, it’s important to ensure that most of your well-traversed user stories are addressed in your testing suite. Otherwise, app-breaking bugs will make it into production, resulting in grumpy users and an unhappy and fatigued internal team.

Picking a Number That’s Just Right

As we’ve discussed previously, the best way to decide how many end-to-end browser tests to perform is to determine how many different ways users actually interact with your application.

For many of us, our first instinct when approaching any problem is to seek out more data. In testing, acquiring data about how users routinely and realistically interact with an application is the first step to actually choosing the right test cases. After that, it’s up to you to decide how many of the user stories that actually occur provide enough value to your business to routinely test.

If you were to graph the distribution of cumulative observed user behavior with a histogram, it would look much like the graph above: a steep curve of early behaviors, and then a bend towards a steep asymptote. After about 60-70% of total observed user behavior, the incremental coverage of each additional test case becomes negligible. From our own research, we find that this long tail of behavior doesn’t typically represent uncommon feature usage–most of it is behavior that doesn’t align with features at all. It can be ignored.

In the end, while your biggest obligation is to provide a quality product to your customer, your second obligation should be to do so quickly and cost-effectively. There exists a “just right” space of testing what matters, and not testing what doesn’t. Data is your guide to finding this golden ratio. If you can identify the core use cases in your application, you are no longer picking between a false choice off “high coverage” and good runtime / high stability: the notion of trade-off ends and you’re getting the best of both worlds.

Why Google Demoted End-to-End Testing, and Why It’s Not Dead

The Testing Pyramid

In 2015, Google directly criticized the role of End-to-End (E2E) testing in the QA process. Google correctly noted that E2E tests are comparatively unstable and require reworking whenever there is an update to the application. If that isn’t t done, or is done incorrectly, testers and developers will not only be stuck sifting through the weeds to locate and fix bugs, but also wondering whether the test result was even genuine. Instead,Google says, there should be a heavy emphasis on Unit and API tests which are more accurate and minimize testing time. In Google’s view, E2E testing is treated like its counterpart at the top of the now-outdated food pyramid—fats and oils—”use sparingly.”

Sometimes readers take Google’s advice aggressively, cutting E2E testing from their entire process. However it still plays a critical role in quality that must not be ignored.

Why E2E Testing Still Matters

Unit and API tests do indeed run faster and cost less to maintain, and therefore have every right to win the bulk of testing mindshare. However, E2E tests remain the single best way for testing how the application will work in the hands of an actual user. Unit tests only address a single block of code at a time. API tests model the interaction between features. But your app’s true functionality depends on how every feature, every element, and every layer works together, not in isolation.

E2E testing is the only way to incorporate all of the network and server effects that influence your application’s functionality and user experience—a major factor in the overall performance of an application. What’s more, a complex user flow touches a myriad of points across the application, changing the application state and data in the database multiple times along the way. These user behaviors are diverse, unpredictable, and complex—they introduce states and inputs in ways that developers simply can’t preemptively imagine. Different code modules and API calls need to operate together seamlessly for a user to be able to properly navigate the app. 

That being said, E2E tests cannot be an afterthought if your aim is to deliver quality code that works for its intended purpose. 

How to Make it Happen

E2E testing is hard not only because of its resource intensity, but also because it’s difficult to test accurately. Conventionally, testing teams must imagine what users might do in order to test for their behaviors. They need to decide: should we cover what we think are the core use cases? Should we attempt to imagine and cover every edge case? The first approach can lead to poor coverage; the second can lead to significant test case bloat (and also poor coverage).

There’s a better way. If you want to test your application for how users intend to use it, let your users tell you what to test. 

Find out how users are actually using and navigating your application, and test that. Product analytics tools have existed for years, but have not yet been accessed to their full potential in quality assurance testing. They’re used by marketing religiously, by product sometimes, and by quality never. There’s no reason the same product analytics data that drives UX decisions shouldn’t drive testing decisions.

The most common pitfall when it comes to testing is that developers and testers are testing the intended behaviors of their users and missing the data that is actually worth finding out—the unexpected ways in which users are actually navigating through the application, whether or not those behaviors are originally intended. If you can test actual user behavior, you will have a lean, mean testing machine that ensures your application’s core functionality always works, and that app-breaking bugs are always caught before they get out the door.

As Google suggests, you should have fewer E2E tests relative to others. We suggest, however, that E2E tests should be the crown of your testing process, not simply an oily “use sparingly” afterthought. 

To use end-to-end testing to build your very own lean, mean testing suite, consider the following tips:

  • Use analytics to discover the few dozen key user flows that truly matter
  • Build E2E tests only around those user flows—don’t be tempted to add new tests for every possible bug
  • Retire tests aggressively—when user flows change and old tests become obsolete, get rid of them (At the very least, comment them out)
  • Make a rigorous (but quick) check of your current testing suite vs. known user flows as part of your regular maintenance cycle
  • Consider whether automating the process might be the best solution for your team