Extremely simply put, end to end testing is a type of software testing to determine if software is behaving as expected. This guide is an overview of how we approach end to end (E2E) testing, and why it matters.
“You can test every servo, every wire, every component; the hardware, the firmware, the software. But you have no idea whether your robot is going to work, and whether it’s going to work in the wild, until you let it out into the wild and let it fail. And it will fail, even when it seems like every component works individually. Even well-built things get hairy when you connect them all together.”
We didn’t have a truly applicable analogy for end-to-end testing until we heard it from a customer that had previously built robots before moving over to web software. Sure, there must be some sort of theoretical, Platonic ideal of testing in which exhaustive testing of components and subsystems will guarantee that the robot—or your web application—will work without needing to be run “in the wild.” But, we’ll wager, nobody’s found it yet. The perfect software testing method doesn’t exist, but we can get close.
Why Do We Perform E2E Testing?
This is the essence of end-to-end (E2E) testing, and why it’s so important. Your web application is probably less complex than a Mars rover, but you similarly won’t know whether it’s going to work once you put all the bits together until you’ve done just that. Your unit tests will test individual blocks of code for their core functionality. API/integration tests will make sure that your “subsystems” are working as intended. But, E2E tests are intended to test the entire application as real users would use it, in conditions similar to how real users will use it.
Why End to End Testing is Important
Therefore, an E2E test will actually launch the program in a browser and interact with it in a way that will test every layer of the application: the user interface itself, the browser (and compatibility with it), the network, the server, the APIs, the codebase, any 3rd party integrations, and any hardware—the whole kit. As with the robot, you don’t really know how all of these components and layers will work together until they’re doing just that—working together. You therefore don’t want to be shipping changes to your application without testing it end-to-end (unless you don’t mind bugs sneaking through).
E2E tests can assume many names, in part depending on their level of rigor. They can be called browser tests, smoke tests, user acceptance tests (UATs), or (less accurately) UI tests. Typically these all mean the same thing: you launch a browser to interact with the application, and check that specific behaviors still work as intended.
There are two ways to launch this browser and interact with the whole application: the first is with a human who checks for failures by clicking around, referred to as manual testing. The second is by having a machine virtually simulate a human, using predetermined validations to check for failures, referred to as automated testing.
What is the Optimal Number of End-to-End Tests?
Ideally, QA teams should not simply be tasked with guessing at how users are using their applications. They can and should employ advanced product analytics to understand these user journeys and how they are evolving with time. In this way, focused testers are then able to fully understand which E2E test cases are most relevant and write the best corresponding tests to ensure quality without bloating the testing suite.
The best way to decide how many end-to-end browser tests to perform is to determine how many different ways users actually interact with your application. If you were to graph the distribution of cumulative observed user behavior with a histogram, it would look much like the graph above: a steep curve of early behaviors, and then a bend towards a steep asymptote. After about 60-70% of total observed user behavior, the incremental coverage of each additional test case becomes negligible. From our own research, we find that this long tail of behavior doesn’t typically represent uncommon feature usage–most of it is behavior that doesn’t align with features at all. It can be ignored.
In the end, while your biggest obligation is to provide a quality product to your customer, your second obligation should be to do so quickly and cost-effectively. There exists a “just right” space of testing what matters, and not testing what doesn’t. Data is your guide to finding this golden ratio. If you can identify the core use cases in your application, you are no longer picking between a false choice off “high coverage” and good runtime / high stability: the notion of trade-off ends and you’re getting the best of both worlds.
When to Apply End-to-End Testing?
End-to-end testing takes place after other stages of testing have been completed. It is applied once all individual components have been tested, to ensure that the system works once each individual piece is connected together. End-to-end testing is intended to test the entire application like a real user would, from the very start to the very end of the application.
Contemporary applications typically involve the use of many different sub-systems working together. While these sub-systems may have been tested individually, it’s not a guarantee that sub-systems tested in isolation will work as expected when integrated into a single system. In a complex application, a failure in any sub-system may have ramifications that extend across the whole system. These failure points often do not become apparent when tested individually, which is why end-to-end testing is conducted after the various sub-systems have come together.
E2E Testing Methodologies
There are two types of end-to-end testing methodologies: horizontal and vertical. Horizontal is more widely-used, but each can have their place within a well-structured end-to-end test suite.
Horizontal
Horizontal end-to-end testing considers the application as a whole, with tests spanning the entire application across multiple systems. A well-defined workflow is required and test environments for each system must be set up in advance. Horizontal testing involves verifying each individual workflow works from start-to-finish, which involves the testing of multiple different systems at once.
For example, in an e-commerce application, horizontal testing would involve having a tester sign into a user account, browse products, add some to their cart and then attempt to check out. These tests ensure that each component in a system performs as expected when put together. Horizontal testing is useful for developing tests that focus on the perspective of the end-user and helps prevent issues in workflows from making it to production.
Vertical
Vertical end-to-end testing breaks down an application’s architecture into layers, which are each individually tested and assessed in whole in a hierarchical order. These layers can be considered as stacks, rather than as a workflow. These layers include things such as the UI, database calls, API requests and more. Each of these layers is tested thoroughly from the bottom up in isolation before moving on to the next layer. Though less common than horizontal testing, it is useful in certain scenarios, such as with layers that lack a UI or for safety-critical software.
Metrics for End to End testing
Measuring your testing efforts gives you the confidence that you’re testing the right things in the right places and producing the results that you want. There are several different metrics that can be used to measure end-to-end testing, which provide information ranging from the value of the tests in your test suite to the number of defects being caught and fixed.
Test Case Preparation Status
This measures how many test cases have been prepared against the number of planned test cases. Tracking this helps you measure your progress when developing a test suite, which gives you an idea of how complete your test suite is compared to your plan.
As we’ve discussed elsewhere, using product analytics to define customer-centric test case plans upgrades this measurement from being fairly arbitrary to being meaningfully objective.
Weekly Test Progress
As the name suggests, this is used to track the progress of your tests on a weekly basis. This lets you know not just how many tests have been developed, but also how many passed/failed, how many were executed/unexecuted and the success/failure rate of those tests. Tracking test stability, both individually and across the test suite, is a critical way of determining whether your testing process is healthy.
Defects Status & Details
Keeping track of defects caught by tests and reviewing them on a regular basis provides valuable information about the number of open and closed defects in an application. As well as tracking the existence of these defects, additional information such as defect severity should be included to enable additional test suite analysis and prioritization for developers to fix those defects.
Test Environment Availability
Another important metric to track is the availability of your test environment. This means how much time is allotted to test a specific environment, as well as how much time is actually taken when testing each environment. In addition to providing valuable information about test runtime, this also lets you better prioritize your tests given known time constraints.
Data-Driven Testing
And as with our Mars rover, the ideal test case simulates real-world usage as precisely as possible: testing the application in the same way that your users are using it, or are going to use it. This requires having data which tells you how your users are in fact using your application. Data integrity is essential here. Utilizing real user data is always possible when testing for regressions. But, user behavior needs to be estimated (or, frankly, guessed) when testing brand new features because you don’t have data about real usage quite yet.
Some teams might be tempted to do “kitchen sink” testing and try to test the application in every possible way, rather than in a way that reflects user behavior. We discourage this elsewhere in more detail elsewhere, but the primary consideration is that E2E tests are the most expensive, least stable, and slowest tests you’ll run. Having too many is going to incur dramatically increased costs for steeply-diminishing returns.
E2E Testing Limitations & Challenges
Finally, a word of caution: E2E testing has limitations. It’s great at testing that the application will generally function: that users can always move through workflows without errors or breakages. Early in the software development lifecycle, it can be a lifesaver. It’s great at ensuring that all of the gnarly bits of code are working together when a user checks out or logs in or executes an analysis. But E2E testing isn’t great (or efficient) in testing that the right data is going to be displayed for a user or stored in the application—this is a place for unit tests to thrive. E2E testing also isn’t great at showing you where in your codebase your bug is hiding—just where in the user’s journey they’re going to find that the application is broken. Finally E2E testing isn’t great at telling you whether your page is formatting incorrectly or “looks wrong.” It can do a bit of this, but it’s a heck of an expensive way to do so. We recommend using testing tools like Percy.io for testing visual regressions, instead.
In short: ignore E2E testing at your own peril, but over-relying on it won’t do you any favors, either.