What is Regression Testing and Why is It Important?

An ounce of prevention is worth a pound of cure.”  -Benjamin Franklin

This article is for everyone that asks, “why would you want to keep testing the features that are already live? Shouldn’t we focus on testing new features?” Those of you who already know the importance of testing both can probably focus on other content instead.

Testing new features that go live is certainly critical to making sure your application works. Brand new features should be tested extensively, not only for functionality, but also for user experience, conversion (as appropriate), performance, etc.

Your current feature set needs to be tested for regressions. regression is what it sounds like if you took statistics in college: your application regresses from its current state to something worse. It is deviation from expected state. This happens in one of two ways:

  1. You’re maintaining, altering, fixing, or improving a feature (rather than introducing a new one) and break it.
  2. You’re changing just about anything and end up breaking something completely different in the application.

The first form of regression is fairly obvious; the second can be a head-scratcher. The short version of why this can happen: almost any application is deeply interconnected. There’s a concept called DRY – “Don’t Repeat Yourself.” Good developers don’t copy code; rather, they make that code accessible to all features that touch it. Any area of an application depends on many others to function properly. If you break something while working on inventory management, you might wreck checkout. Updating infrastructure might break login. It’s not that every change can truly affect every part of the application, but any change might impact multiple parts of the application if a bug is introduced.

Regression testing, therefore, tests to make sure any of these regressions are caught before they make it to production. Generally, you run a suite of regression tests by testing every unit of code, every API, and core user pathway across the application at the browser level, with every build. Some teams can’t run their regression suite fast enough (or are using manual or crowdsourced testing, which has incremental cost per test run because you’re throwing extra bodies at the problem) that they run their browser regression testing suite on a less-frequent schedule. The obvious downside of this is that you’re more likely to let regressions into production before they are caught.

Automation vs. Manual Testing

If you’re considering automating your browser regression suite, start with features that are least likely to be changing rapidly: building automation takes time to pay off, so you want to make sure that these tests are going to be run more than a few times before they need to be re-written. For brand new or rapidly-evolving features, manual testing may be your most efficient approach.

When you do decide to automate: automated browser regression testing suites are built in a number of ways. The most basic is scripting them in SeleniumCypressCapybara, with Javascript in TestCafe, or using other such frameworks. They can also be built using record-and-play tools such as Selenium / TestCafe IDE. Machine Learning is making record-and-play stronger and less time-intensive, and will eventually allow record-and-play to drive itself using web traffic data.

What is End-to-End Testing?

“You can test every servo, every wire, every component; the hardware, the firmware, the software. But you have no idea whether your robot is going to work, and whether it’s going to work in the wild, until you let it out into the wild and let it fail. And it will fail, even when it seems like every component works individually. Even well-built things get hairy when you connect them all together.”

We didn’t have a truly applicable analogy for end-to-end testing until we heard it from a customer that had previously built robots before moving over to web software. Sure, there must be some sort of theoretical, Platonic ideal of testing in which exhaustive testing of components and subsystems will guarantee that the robot—or your web application—will work without needing to be run “in the wild.” But, we’ll wager, nobody’s found it yet.

The Advantage of End-to-End Testing

This is the essence of end-to-end (E2E) testing, and why it’s so important. Your web application is probably less complex than a Mars rover, but you similarly won’t know whether it’s going to work once you put all the bits together until you’ve done just that. Your unit tests will test individual blocks of code for their core functionality. API/integration tests will make sure that your “subsystems” are working as intended. But, E2E tests are intended to test the entire application as real users would use it, in conditions similar to how real users will use it.

Therefore, an E2E test will actually launch the application in a browser and interact with it in a way that will test every layer of the application: the user interface itself, the browser (and compatibility with it), the network, the server, the APIs, the codebase, any 3rd party integrations, and any hardware—the whole kit. As with the robot, you don’t really know how all of these components and layers will work together until they’re doing just that—working together. You therefore don’t want to be shipping changes to your application without testing it end-to-end (unless you don’t mind bugs sneaking through.)

E2E tests can assume many names, in part depending on their level of rigor. They can be called browser tests, smoke tests, user acceptance tests (UATs), or (less accurately) UI tests. Typically these all mean the same thing: you launch a browser to interact with the application, and check that specific behaviors still work as intended.

There are two ways to launch this browser and interact with the whole application: the first is with a human who checks for failures by clicking around, referred to as manual testing. The second is by having a machine virtually simulate a human, using predetermined validations to check for failures, referred to as automated testing.

Data-Driven Testing

And as with our Mars rover, it’s ideal to test the application by simulating real-world usage as precisely as possible: testing the application in the same way that your users are using it, or are going to use it. This requires having data which tells you how your users are in fact using your application. Utilizing real user data is always possible when testing for regressions. But, user behavior needs to be estimated (or, frankly, guessed) when testing brand new features because you don’t have data about real usage quite yet.

Some teams might be tempted to do “kitchen sink” testing and try to test the application in every possible way, rather than in a way that reflects user behavior. We discourage this elsewhere in more detail elsewhere, but the primary consideration is that E2E tests are the most expensive, least stable, and slowest tests you’ll run. Having too many is going to incur dramatically increased costs for steeply-diminishing returns.

Limitations of E2E Testing

Finally, a word of caution: E2E testing has limitations. It’s great at testing that the application will generally function: that users can always move through workflows without errors or breakages. It’s great at ensuring that all of the gnarly bits of code are working together when a user checks out or logs in or executes an analysis. But E2E testing isn’t great (or efficient) in testing that the right data is going to be displayed for a user or stored in the application—this is a place for unit tests to thrive. E2E testing also isn’t great at showing you where in your codebase your bug is hiding—just where in the user’s journey they’re going to find that the application is broken. Finally E2E testing isn’t great at telling you whether your page is formatting incorrectly or “looks wrong.” It can do a bit of this, but it’s a heck of an expensive way to do so. We recommend using a tool such as Percy.io for testing visual regressions, instead.

In short: ignore E2E testing at your own peril, but over-relying on it won’t do you any favors, either.

How to Ship Products Fast and Fixed

“Without continual growth and success, such words as improvement, achievement, and success have no meaning.” ~ Benjamin Franklin

Modern software development has changed dramatically from even 5 years ago: things move fast. New features are developed continuously, and the ease of updating web applications means that software is constantly in motion. This creates fierce competition to constantly improve an application for your customer base.

The Past: Fast and Broken

Facebook famously launched a “move fast and break things” approach to its development. Many followed. But, even Facebook eventually learned that “move fast and break things” doesn’t work. It turns out users don’t like broken applications, but they want something that works. This adds even more pressure to dev teams to maximize both speed and quality/functionality when shipping code.

The Future: Fast and Fixed

Rather than a “fast and broken” mindset where a team fixes software in production, after users see bugs, a winning software team will have a new standard: fast and fixed. Get new features out quickly, and make sure they work. Continuous, quality delivery is the name of the game.

However, this standard is extremely difficult to reach. Most teams feel constrained to pick just one focus: speed, quality, or price. But, if you want to win, you need to do it all. And this requires a highly mature organization with mature processes.

If you want to ship fast and fixed products—if you want high quality continuous delivery—you need three process elements to complement your team’s talent and rigor:

  • An efficient continuous integration (CI) pipeline
  • A diligent and thoughtful code review process (with every build)
  • A rigorous testing methodology (with every build)

The Common Testing Predicament

Adopting a CI pipeline is becoming standard, and standardized. Code review is a subject for another time and requires discipline and work, but can be achieved through sufficient effort. But what constitutes sufficient testing much less rigorous testing is its own set of challenges that are largely unsolved in the industry: most teams are picking between having extensive and slow testing suites, or minimal and fast ones. The former hurts deployment speed. The latter hurts quality. Most teams don’t believe they can have both.

Part of the problem is that teams frequently misuse the testing tools they have available. Unit tests are best applied to enable refactoring, validate low-level intent, and manage edge cases. Having a large number of unit tests doesn’t necessarily imply that you will catch bugs because inherently they’re not testing what users are doing in the way users are doing them. They have to do this to be able to be run quickly. To validate that the high-level features are working the way intend them to, you need to exercise all the levels of the application in the way users would exercise them. This means exercising a full server (or model thereof) and the part of the application that lives only in the browser. Exercising the browser is particularly important as more teams move more behavior into single page applications. These browser level tests are therefore slower, harder to write, and more difficult to write because of this complexity.

The Path Forward

The path out of this choice is to let unit tests be unit tests and then shift the focus of browser testing from needing to be “extensive” to being “accurate.”

Fast and accurate browser testing can be described as satisfying four requirements:

  • Using a test runner that runs quickly and stably (we suggest Cypress, TestCafe, or Capybara).
  • Covering the important user stories in your application.
  • Minimizing unnecessary and overlapping tests (and eliminating obsolete ones).
  • Running as many tests as possible in parallel.

It goes without saying that browser testing needs to be automated, rather than manual, to ship fast and fixed code. Building it from scratch is an option, but one that often fails: talent is rare, resourcing is expensive, and test suite stability tends to degrade. There are a number of external services that will help a team run browser testing that meets the above criteria, including the following:

Crowdtesting is a process by which the software team provides test cases to the vendor, and the vendor provides a large bank of people to manually perform the scenarios. It has a few advantages: it’s easier to set up than your own automation suite, it requires less ongoing maintenance than a home-built suite, and manual testers can sometimes catch bugs that automated tests would miss. However, this approach has several drawbacks.

Because customers pay for each test run, more software shipped correlates directly to more money spent. While manual testers can sometimes catch bugs that automated tests would miss, they will also frequently report false positives or miss other critical bugs, due to the inexactness of a manual review by an untrained/unfamiliar resource. In addition, while the only real maintenance is updating test instructions, it still means that a resource has to be assigned to the task, continually updating and changing the test cases to prevent the test from becoming stale and outdated.

With Machine Learning (ML)-Enabled Record-and-Play, a third-party application adds an additional layer to your own, allowing you to build tests recording you using your software. These tests are intended to be functional through many small changes of your software, by building “models” of the test event, rather than using conventional testing hooks. This reduces test maintenance costs. Because the tests are truly automated (rather than crowdsourced), you wouldn’t have to pay for the cost of running each test.

However, since it is your team developing the tests with the external application, the gap between your team’s understanding of the application and actual user behavior remains. Additionally, the tests have to be redone every time there’s an appreciable change to the product, requiring substantial attention and input from your team. Lastly, since tests all run through the interface, if you decide to leave the service, you take no assets with you—you’re back at square one.

ProdPerfect offers the final type, Autodetection/Autogeneration. Autodetection tooling analyzes user traffic to determine test cases that represent common flows of user behavior, and then Autogeneration automatically produces repeatable test scripts based on those test cases. The process requires no input from you and minimal (for ProdPerfect) human input to finish test validations. Autodetection and Autogeneration work together continually to update and maintain your test suite through each build of your product, allowing for accurate and realistic testing with minimal time and effort. The tests are parallelized, so they run quickly, and the results are automatically and instantly sent to your developers through CI. Also, you get a copy of the test code with each build, allowing you to run it on demand and continue using it if you leave the service.

When using Autodetection/Autogeneration services, your team will still need to test brand new features that do not affect any previous functionality, as they will not have yet been detected from user traffic.

Ensuring all releases in continuous development are fast and fixed isn’t easy, but it’s absolutely necessary. By removing some of the burden of end-to-end browser testing for each release, your team can focus on doing what they do best: developing the best product they can quickly and efficiently.

When Should I Automate Browser Tests for New Features?

“A user interface is like a joke: if you need to explain it, it’s not that good.”

-Zoltan Kollin, UX Designer

Test automation is critical for continuous delivery and provides fast, repeatable, affordable testing; there’s no doubt it’s a must-have when deploying at speed. Customers often ask us about testing for brand new features—when is the right time to introduce automated tests?—so we’ll cover that here.

When testing for functionality at the browser level, we should differentiate between two kinds of testing: new feature testing and regression testing. The former focuses on making sure brand new features are functional and easy to use; the latter focuses on making sure nothing in the current application has broken as teams deploy new builds.

In brief, we recommend manually testing brand new feature sets, and then deploying automated tests to cover these brand new feature sets for regression. Below we expand on why we believe this.

Regression Testing

Regression testing covers the current application functionality when new changes or features are introduced. It is critical because during the deployment of a new feature, all eyes are on that feature; current functionality will have less human attention. Because existing feature sets are fairly stable, there is a clear payoff to investing in automating these tests: they are repeatable and won’t need to change frequently.

But what about for brand new features?

What is a New Feature?

Testing brand new features is a more interesting puzzle. What should be tested? When should that testing be automated?

Before going further, we should make some distinctions in terminology. “New features” come in three flavors:

  1. Changes that do not affect the user interface (e.g.: a backend change)
  2. Changes that affect the user interface for existing workflows (e.g.: a button moves)
  3. Changes that introduce a brand new workflow (e.g.: adding a new product)

For regular maintenance on an application or alterations to functionality that don’t change the workflow for a user, there’s no need to build brand new tests at the browser level: your current browser testing suite already has you covered—that’s what it’s there for.

For changes that impact a current workflow, you will need to update your existing automated tests to reflect these changes. This can be done during feature development or after the feature hits the testing environment and breaks the testing suite.

For changes that introduce brand new products or workflows, no browser-level automation yet exists to test them. These kinds of changes are what we are calling “brand new features.” This automation will need to be introduced, but should be introduced after the new feature goes to production.

UX Testing and Functionality Testing of New Features

For brand new features or major changes to features, a team will need to develop tests that cover multiple angles. Functionality is key—don’t introduce new bugs—so you’ll need to do functionality testing. But in addition, teams need to test the user interface (UI) for ease of use and customer value before deployment—this is user experience (UX) testing.

This kind of testing can really only be done by humans, and shouldn’t be done exclusively by developers or product teams familiar with the product. Familiarity with product perverts one’s capacity to determine usability. Users unfamiliarwith the new feature need to test it to determine if it’s intuitive and delightful, and strong, quantitative metrics need to be used to understand the big picture and avoid interpretation bias by the product team. Services such as Usertesting.com or Vempathy can provide measurable, quantitative user experience feedback from dozens of different dimensions.

The fact that humans are already repeatedly manually testing a brand new feature for UX means that they are by nature also testing the same new features for functionality: if something breaks, they’ll find it. Building automated tests for brand new features is therefore not yet necessary, but there’s also a good reason to specifically wait.

New Feature Functionality Testing: Timing

For any brand new feature, a team should anticipate that it will be making some major tweaks after releasing to production. A disciplined team will not tolerate releasing major bugs with a new product, but should be ready to improve the product as they get user feedback. You should expect new features released into production to change a few times before they stabilize. For this reason, investing heavily in automated testing for the functionality of those features is a move that should be made late in the game, when the new feature has become more stable: otherwise, you’ll waste your investment in building these automated tests, and will simply need to rebuild them multiple times before they are repeatable.

Automated testing pays off when it’s run many times: it’s expensive and difficult to build, so it doesn’t make sense to build automated tests for workflows that will be tested once or twice before the test needs to be rebuilt. Once the new feature is stabilized, then build your automated tests, fold them into your regression testing suite, and move manual testing efforts towards the next set of new features.

Testing Applications Built on Serverless Architecture: Don’t Fear the Transition

Making a full migration to building an application on serverless architecture is a daunting process already, but to the uninitiated, the journey can seem terrifying.

However, with the right plan, the migration brings greater development speed, control, and cost management. This, of course, is why people choose to go through the transition.

To a large extent, much of the fear is overblown. Myths abound that one must radically change their development practices when migrating to serverless. While some practices change as you harness greater speed and control, many won’t, and there’s no need to throw the baby out with the bathwater. As you plan your transition to serverless, you’ll benefit greatly by knowing which practices don’t need to be changed as part of your transition plan.

Testing in Serverless

Starting with the good news: your testing practices don’t need to change to handle the transition to serverless. The really good news is they can even be improved.

Serverless architecture allows you to compartmentalize different functions easily, meaning the intent and purpose of each particular function more clear. This makes unit testing much simpler and clearer, and it’s easier to know when your code is properly covered by unit tests. Internal APIs also make up a greater portion of the codebase. Well-written APIs are probably the easiest part of the codebase to test because the contracts are so clear in the API code itself. Making the jump to serverless, therefore, doesn’t require changing your testing processes at lower levels.

In fact, it makes the process much easier and clearer.

These components all bubble up into a unified whole application. If you’ve taken advantage of your serverless architecture to compartmentalize different parts, then modifying an individual module is less likely to break everything else. You’re less likely to create a regression based on sloppy integration, which is great. But you’re also going to be shipping new changes continuously and likely not always through a central unified pipeline. This will slightly change how you test the application end to end.

The best way to make sure you haven’t broken your application is still to test the whole thing at once, end to end (this is sometimes called browser testing, UI testing, or user acceptance testing). To support serverless development, you will need to automate your end-to-end testing, and keep the testing suite lean so it runs quickly and can support frequent deployments.

Best Practices for Automated End-to-End Testing

Most principles of good end-to-end testing practice are consistent from bare-metal to serverless.

E2E tests are intended to test the entire application as real users would use it, in conditions similar to how real users will use it. Therefore, an E2E test will actually launch the application in a browser and interact with it in a way that will test every layer of the application: the user interface itself, the browser (and compatibility with it), the network, the server, the APIs, the codebase, any 3rd party integrations, and any hardware — the whole kit.

  1. Set up a testing environment that closely reflects the live application being tested. This QA, Testing, or Staging (whatever you want to call it) environment will be updated with the most recent build before that build goes to production, and the tests will be run there. This environment will be integrated into your manual or automated deployment pipeline.
  2. Implement a process to react to the feedback your tests give you. If the tests fail, your build should not deploy to production. That feedback should immediately alert the developer who committed the build, and the developer should be responsible for diagnosing the failure, determining if there is a bug, and if so, fixing it. If not, the developer provides feedback to the QA automation engineering team about why the test failed and how to modify tests to pass on the next bug-free build.
  3. Decide what to test. Test case management is quietly the most difficult part of E2E testing, and is critical to get right. In short, your E2E tests should reflect how users are actually using your application. Focus on actual use cases that are intended to end in a satisfying conclusion for the user; that is, they have accomplished something they set out to do.
  4. Design validations. Validations in your automated test code should validate that each interaction point successfully vaults the user through their workflow; each interaction should successfully get the user to the next one. Tests should also validate the data that is transformed directly by the interactions through the test workflow. This ensures that the user has succeeded in their intent: the password is updated, or the product is ordered, credit card is charged, and it will be shipped to the right place.
  5. Finally, you’re going to write the test code, which will direct a driver to launch the browser, initiate each interaction, and validate every step along the way. Selenium is the most common toolset, but recent innovations have brought about other competitive frameworks such as TestCafe and Cypress; new ML-driven tooling also shows promise as a highly stable execution framework.

Your end-to-end tests should be run every time you deploy code, whether it is a front-end change or a back-end change, big or incremental, to catch regressions before they go live. You’ll need unified tooling (including Continuous Integration) to ensure the testing suite is kicked off automatically regardless of where the deployment occurs. If the tests are kicked off immediately and run quickly, they provide the same level of instantaneous feedback that unit and API tests provide, dramatically increasing their value over tests that are run much later in the process.

In addition to end-to-end testing, developers building on serverless need to set up continuous monitoring to look for unexpected performance or functional changes on production. Besides an end-to-end testing suite as a synthetic functional monitor, you’ll want a serverless-specific application performance monitoring tool like IOpipe to trace and profile which serverless functions are potentially impacting users.

This article was first published on IOpipe’s blog.