The seismic shift in how we test software

As it’s been since ARPANET, functional web software today is mostly shipped by luck, duct tape, or sheer will. Ask any engineer with tenure at an ecommerce company and they can tell you about the last time they broke checkout, the defining feature of doing commerce online.

Every year we have groundbreaking technologies that change the entire game of what kinds of software we develop: virtualization, the cloud, mobile, SOA, REST, NOSQL, ML/AI, microservices, SPAs, serverless, the list goes on. But except for the very bleeding edge of talent and investment, software has been tested pretty much the same way over the last 20 years: a mix of human manual testing, and a shallow layer of test automation.

As late as 2018, even sophisticated businesses struggle with testing:42% of software companies are still testing entirely manually; only 32% have become mostly automated, according to recent research sponsored by testing companies Sauce Lab and SmartBear. 75% of testing teams aren’t keeping up with software development: code is being deployed only partially tested. The majority of testing teams don’t even have a firm practice for test management—they aren’t even certain what they’re testing. The bugs that are not caught by testing cost the globe literally trillions per year. This is true despite the fact that 25% of software budgets are allocated towards testing for QA.

We continually hope to be building better software year over year, but the bugs have been inescapable. Humans are imperfect and inconsistent which, while beautiful in our way, you’d never build a human pyramid more than a couple of layers high, less you invite catastrophe. We’re slower to the task than machines, we behave inconsistently, and communication breaks down at sufficient scale or with multiple layers of the organization. With any manually tested system having bugs, project failures and high development costs are the expectation and the norm.

However, there’s a light at the end of this tunnel. The last 2 years have seen a new breed of tools appear that have the chance to change the game. We ourselves roll our eyes when products throw the words machine learning or artificial intelligence at their own product. But suffice it to say the tools are getting smarter in a non-trivial way.

The Rise and Fall and Rise Again of Browser Automation
As an industry, we’ve tried and failed to get away from using a browser to test web applications. We tried unit testing, we’ve tried API testing, we’ve tried behavior-driven testing, we’ve tried test-driven development, and we’ve tried just monitoring. None of these are necessarily bad, but it’s pretty rare that they provide a truly conclusive test that the product you are shipping will work for its users. Every one of those approaches tests a limited understanding of the application and there will always be gaps in the testing that will let significant bugs get through. Good teams have as many nets as possible so that each testing system’s blind spots are hopefully covered by something else, but it’s impossible to have certainty. And the worst blind spots are usually where the integrations and units come together. At the end of the day, a web application is exercised by users through a browser: the only way to really know if it will work is by testing as much as possible through a browser preferably as close as possible to how real users exercise the system.

For all the advancements in other tooling and successor frameworks over the last 15 years, the standard in Software Quality Assurance Engineering is Selenium. Originally developed in 2004 the idea was simple: rather than ask a human to manually click through a browser to test features of a web application, you could write code that would do this work. Originally a framework to remote control a browser it has evolved to be an abstraction for controlling or deeply integrating into a wide variety of browser-like systems as well as large scale collections of remote machines that could run the browser.

While the initial principle is simple—automate a browser—the results can be complex. Selenium’s evolution has spawned incredible layers of abstraction to handle all the different varieties of browser behavior and usage that one could dream of, as well as where either the browser can be run or and abstractions to fit how the developers want to write their tests. The dirty secret is that even were it a simple browser, it really isn’t just testing one unified component. Automated browser tests are almost always testing an independently standing environment. This means there’s a standing network setup, server setup, database, and external services for each test system involved. Since each of those components then have both a constantly evolving state and processes for being updated, there are many independent variables that need to be controlled. The most frustrating impact is on time-based behavior. Loading a web page kicks off a variety of synchronous and asynchronous processes which makes deciding when to call a page fully loaded and ready to test tricky, particularly with the rise of single page applications. Mash all these frustrations together and you get modern Test Engineering.

As such, maintenance burden and stability issues have continued to plague browser testing. Test suites have continued to be maintained manually, by humans updating code, across changes to the underlying application. Stability issues and complexity have resulted in tests that are flaky, they sometimes pass and sometimes fail for the same version of the application. These problems have meant that browser testing has remained expensive, frustrating, slow, and ultimately of limited effectiveness. In 2015, Google was officially advocating severely limiting the number of browser tests a software team should maintain. The whole process left much to be desired.

Various other tools emerged to improve the efficacy of browser testing: various record-and-play tools and other “scriptless testing” products were designed to allow less technical and less expensive resources to drive automated testing–these tend to be particularly unstable. Products such as BrowserStack and SauceLabs made cross-browser testing much more accessible, instead of needing to maintain a series of remote machines hosting various versions of various browsers you could instead pay a service to do that for you.

Browser testing has had an ongoing less-acknowledged but still serious problem in its entire history: ultimately, software teams have to guess what’s important to test. No data drives the decision of what to test. Software teams get together and decide what they believe are the most important use cases on their application which require testing, and go test those. A gap always exists between what developers predict their users will do, and what they actually do. This means that critical bugs which affect core user flows can be missed, even when a test suite is well-maintained. It also means that many tests cover irrelevant or rare use cases, so browser suites become even bigger, and thus slower, and harder and more expensive to maintain.

Recent Innovations in the Space
Browser testing will achieve its full potential when it runs faster, costs less to build and maintain, and (most importantly) runs consistently, throwing alerts only when it is catching bugs. A number of recent innovations in the space have brought browser testing a long way.

Crowdtesting is a process by which the software team provides test cases to the vendor, and the vendor provides a large bank of people to manually perform the scenarios. It has a few advantages: it’s easier to set up than your own automation suite, it requires less ongoing maintenance than a home-built suite, and manual testers can sometimes catch bugs that automated tests would miss. However, this approach has several drawbacks. There are a few major players in this space.

Because customers pay for each test run, more software shipped correlates directly to more money spent. While manual testers can sometimes catch bugs that automated tests would miss, they will also frequently report false positives or miss other critical bugs, due to the inexactness of a manual review by an untrained/unfamiliar resource. In addition, while the only real maintenance is updating test instructions, it still means that a resource has to be assigned to the task, continually updating and changing the test cases to prevent the test from becoming stale and outdated.

Crowdtesting is much like the American military: it was an innovation to win the last war, not the next one. The machines will provide far more resource-efficient, consistent, and easy-to-use testing products that will leave Crowdtesting as a footnote that ultimately served as a stopgap between the old way and the new way of testing.

With Machine Learning (ML)-Enabled Record-and-Play, a third-party application adds an additional layer to your own application, allowing you to build tests recording you using your software. These tests are intended to be functional through many small changes of your software, by building “models” of the test event, rather than using conventional testing hooks. This reduces test maintenance costs and significantly reduces instability. Because the tests are truly automated (rather than crowdsourced), you don’t have to pay for the cost of running each test. There are a few big players in this space and perhaps a few dozen others.

However, since it is your team developing the tests with the external application, the gap between your team’s understanding of the application and actual user behavior remains. Additionally, the tests need to be rebuilt every time there’s an appreciable change to the product, requiring attention and input from a software team. Lastly, since tests all run through the interface, if you decide to leave the service, you take no assets with you–you’re back at square one.

Ultimately, we believe the core problems of browser testing won’t get solved until machines can help with deciding what to test, in addition to helping with how to test an application. Ultimately, good browser testing exists to make sure users can do what they want to on an application, without encountering bugs. If you’re able to test what users are doing, you’ll make sure they don’t see bugs.

Autodetection/Autogeneration is where machines begin to help to decide what to test. Autodetection tooling analyzes user traffic to determine test cases that represent common flows of user behavior, and then Autogeneration automatically produces repeatable test scripts based on those test cases. This process can be repeated continuously to update a testing suite. The players in this space have emerged more recently and are fewer in number.

The last challenge for Autodetection-driven technologies is anticipating changes to User Interfaces (UIs). Ideally, browser tests would only break when a true bug has emerged. When UIs change dramatically, even machine-driven tests will fail when the change is deployed to a testing environment. In a short time, it’s likely that these technologies will be capable of detecting and prioritizing developer behavior in pre-production environments to automatically update tests to anticipate these UI changes.

What it Means for Software Testing Today
Improvements in data science and data engineering (behind true Machine Learning and most “machine learning” tools masquerading as true ML) have unlocked quite a deal of potential in helping reduce the cost and instability of browser tests. Machines are able to do more of the work. If we think of the history of browser testing over the past 25 years, the path has been something like this:

  1. Humans decided what to test, and humans tested it manually
  2. Humans decided what to test, and humans wrote and maintained code to have machines do the testing
  3. Humans decided what to test, and machines helped low-cost humans more quickly do the testing
  4. Humans decided what to test, and humans told smarter machines how to test with less babysitting and maintenance
  5. Machines decide what to test, and machines do the testing

Steps 4 and 5 represent the recent seismic shift in testing. We’re now seeing an emergence of a few technologies that will cut the human out of the loop completely, both in deciding what to test and translating those tests to machines. The result will be test suites that catch more bugs, update and run faster, and require no human effort to build or maintain. We suspect that in 5 years, machines will own the entire browser testing process.

The Next 5-10 Years
The processes that we’re seeing today are just the beginning. What’s happening is that we’re unleashing data to improve the way we test the web. First, we were collecting data between executions of each of our test runs to improve our expectations. Now we’re updating our expectations of what should be tested based upon the behavior of the user. These expectations and test behaviors will only get more intelligent over time as we’re ultimately simulating users. These user simulations will eventually get sufficiently accurate and intelligent that we can expect to rely on it as the primary form of web software testing. The goal is for the tools to recommend potential solutions to the problems they have seen in the past.

This article was first published on SDTimes.com.

What Can and Should be Automated in Software Development?

Software development is a highly creative human endeavor that requires a concentrated mix of talents: efficient design, organization, architectural strategy, coordination with key business needs, and a deep attention to detail. It’s hard. Almost anyone can sling code if they take a few online classes. But few people can develop stable, extensible software.

Many of the human skills in software development are difficult to acquire and to quantify. Some come with experience, but still only when complemented by skill. There seems to be a natural talent that cannot be taught: many recruiters still look for the mythical “10x” developer who is ten times as productive or effective as others. From these sorts of mysteries and unquantifiable factors come a great deal of “philosophy” and legend about what goes into making great software.

It’s no surprise, therefore, that there is skepticism and even apprehension around the idea of automating any part of a developer’s job. Can machines really replicate any of what great developers do? And should we even try?

What should we automate?

What’s the difference between sculpting marble and breaking rocks?

The difference is in the engagement of the mind. The tools are the same. The medium is the same. How the mind engages with the task is what matters. While both use steel to change the shape of stone, one is drudgery and the other is creative and delightful. Breaking rocks all day burns people out. Sculpting sets the soul on fire.

David Liedle and I were discussing automation the other day and I realized that this is a great analogy for what we should automate. Robots can break rocks. We don’t need humans to break rocks. We do need humans to sculpt.

Are there parts of the software development process that are more like breaking rocks than sculpting? Of course. Would we ask a sculptor to chisel their own rock out of the earth and carry it to their workshop? No: it is a terrible use of their time and does not take advantage of their unique talents.

For software development, we should automate the parts of the process that do not engage the creativity, the strategy, the cleverness, and the organizational strength of a great developer. We should automate the drudging parts that burn people out.

What can we automate?

Perhaps not surprisingly, the tasks we should automate and the tasks we can automate have significant overlap by their very nature. The kinds of tasks that lack the special, human parts that are so hard to quantify are the very ones that are easiest to break into parts and automate in turn.

Right now, and for the foreseeable future, we automate tasks that can be defined and repeated, either deterministically or probabilistically (the latter being what we think of as “AI”). In human history, the tasks which have been automated have been those wherein the human mind is no longer creatively engaged. We have automated picking crops, forming boxes, stacking shelves. We are beginning to automate repetitive tasks on applications using Robotic Process Automation. QA engineers automate the task of manually clicking through an application repeatedly. All of these free up the human mind from drudgery so it can turn its focus towards more beautiful work.

We have seen it in other parts of the software development process: performance analysts used to repeatedly probe applications for performance issues; now Application Performance Management runs on its own when set up. Software deployments used to be heavily-managed events; now they can be done with a click of a button. All of these tasks are not what makes software engineering interesting or valuable to the human mind.

This holds true for the current wave of automation: the jobs being automated are those which have been so proceduralized by management process already that they no longer set the human soul alight. And there’s much more of the software development process that can yet be automated away from human burden.

At ProdPerfect, we seek to combat the drudgery of sitting in a room guessing what’s important to test, and repeatedly re-writing and re-tooling the same end-to-end automation tests. We’re here to fight burnout, to help software teams deal with less BS from broken code and from having to test it, so they can go build the things that help other people avoid burnout, and thrive.

As with every wave of automation, there’s some discomfort and incredulity that anything but an experienced, well-trained human can do the trick. In ten years, we won’t be able to imagine doing it any other way.

End-to-End or Unit Testing: Which Tests for Which Bugs?

“Not even Ares battles against necessity.”

-Sophocles

When designing a holistic testing strategy for any application, the QA strategist has to first answer, “Which testing methods should I utilize for which types of bugs?” Some bugs are rendering errors, some involve the application returning the wrong data, others are functionality issues (the user simply can’t do what they intend to), and others are application-level/browser errors, each requiring some specificity in approach.

Often, businesses rely too exclusively on either end-to-end (browser-level) testing, or too exclusively on unit testing, without properly accounting for different kinds of bugs. Some businesses try to bake various data validation checks into their end-to-end testing. At ProdPerfect, we’ve heard business leaders suggest, “if all of your unit tests are well-written, there’s no need for end-to-end testing.” Both of these approaches, though, are flawed. QA teams need both end-to-end and unit testing, and they should be applied differently.

Unit Testing vs. End-to-End Testing

Unit testing checks code blocks (typically with a black-box mindset): variable X is the input; variable Y should be the output. Unit testing efficiently checks for the functions or calculations that provide resulting data—a numerical value, a text string, etc. End-to-end testing tests all layers of the application at once; it’s best-suited to make sure buttons, forms, changes, links, and generally entire workflows function without problems.

Here’s an example to illustrate the proper approach to testing decision-making:

“Llamas R Us” is an E-Commerce company, selling llamas online by subscription. Their software includes a sales tax calculator. When a customer selects their ship-to location during the checkout process, the sales tax calculator automatically calculates the tax and applies it to the total cost of the customer’s monthly llama purchase. What needs to be tested here is that the right sales tax is applied to the llamas being purchased, depending on which state the would-be llama farmer is living in.

Llamas R Us may be inclined to assign testers to manually complete the checkout process, select different locations, and calculate whether the correct sales tax is being applied. To perform these tests, they may consider using automated end-to-end testing, writing unique code for each different state’s sales tax to ensure full test coverage of the calculator feature.

However, though it seems comprehensive, this system is not in reality optimal. For one, it’s inefficient, as it’s difficult for end-to-end testers, even with automated tests, to read the resulting data from such a web page and verify it—humans are imprecise and machines aren’t great at scraping raw data off of a web page. It’s also simply inefficient to test a checkout process 50 times for 50 different states. Doing so would cause testing time and human testing efficiency to implode.

Data Validation vs. Application Functionality

At its heart, testing the sales tax calculator is a data validation test. The hypothetical calculator is reliant on a particular set of inputs (states, countries, etc.) to generate a particular set of outputs (the sales tax multipliers). Thus, it is an ideal candidate for unit testing instead of end-to-end testing. In this case, Llamas R Us should create individual unit tests to verify the functionality of their sales tax calculator. These tests run much faster, require less work to set up, and don’t need to be changed each time the user interface is tweaked—they should always work until the sales tax calculator code is itself changed.

End-to-end testing, automated or otherwise, is ideal for testing the functionality of an application, rather than the data being sent to the user. For Llamas R Us and their E-Commerce application, end-to-end testing is ideal for ensuring that a buyer can add products to their cart, can navigate shopping categories, and can access product details, images, and reviews. In this case, Llamas R Us isn’t testing to make sure the right data emerges, but that the workflow can be completed, consistently, as the application changes. Because of the various application layers and the many interacting blocks of code being tested simultaneously, end-to-end testing is efficient and invaluable as a tool for testing features like these designed for the usability of a web application. Simply testing at the unit level will never provide the full picture of whether or not the whole application works together. The reality is, you never really know it works until you see it working.

Data validation questions and raw functionality questions are very different issues which must be approached with different kinds of tests. The simplest answer to the question, “Which testing methods should I utilize for which types of bugs?” is that if you’re testing something which produces a given (and by its nature predictable) set of output data based on a given set of input data, unit testing is likely the most efficient. For testing the stability and functionality of a feature or workflow, end-to-end testing is likely best.

How to Ship Products Fast and Fixed

“Without continual growth and success, such words as improvement, achievement, and success have no meaning.” ~ Benjamin Franklin

Modern software development has changed dramatically from even 5 years ago: things move fast. New features are developed continuously, and the ease of updating web applications means that software is constantly in motion. This creates fierce competition to constantly improve an application for your customer base.

The Past: Fast and Broken

Facebook famously launched a “move fast and break things” approach to its development. Many followed. But, even Facebook eventually learned that “move fast and break things” doesn’t work. It turns out users don’t like broken applications, but they want something that works. This adds even more pressure to dev teams to maximize both speed and quality/functionality when shipping code.

The Future: Fast and Fixed

Rather than a “fast and broken” mindset where a team fixes software in production, after users see bugs, a winning software team will have a new standard: fast and fixed. Get new features out quickly, and make sure they work. Continuous, quality delivery is the name of the game.

However, this standard is extremely difficult to reach. Most teams feel constrained to pick just one focus: speed, quality, or price. But, if you want to win, you need to do it all. And this requires a highly mature organization with mature processes.

If you want to ship fast and fixed products—if you want high quality continuous delivery—you need three process elements to complement your team’s talent and rigor:

  • An efficient continuous integration (CI) pipeline
  • A diligent and thoughtful code review process (with every build)
  • A rigorous testing methodology (with every build)

The Common Testing Predicament

Adopting a CI pipeline is becoming standard, and standardized. Code review is a subject for another time and requires discipline and work, but can be achieved through sufficient effort. But what constitutes sufficient testing much less rigorous testing is its own set of challenges that are largely unsolved in the industry: most teams are picking between having extensive and slow testing suites, or minimal and fast ones. The former hurts deployment speed. The latter hurts quality. Most teams don’t believe they can have both.

Part of the problem is that teams frequently misuse the testing tools they have available. Unit tests are best applied to enable refactoring, validate low-level intent, and manage edge cases. Having a large number of unit tests doesn’t necessarily imply that you will catch bugs because inherently they’re not testing what users are doing in the way users are doing them. They have to do this to be able to be run quickly. To validate that the high-level features are working the way intend them to, you need to exercise all the levels of the application in the way users would exercise them. This means exercising a full server (or model thereof) and the part of the application that lives only in the browser. Exercising the browser is particularly important as more teams move more behavior into single page applications. These browser level tests are therefore slower, harder to write, and more difficult to write because of this complexity.

The Path Forward

The path out of this choice is to let unit tests be unit tests and then shift the focus of browser testing from needing to be “extensive” to being “accurate.”

Fast and accurate browser testing can be described as satisfying four requirements:

  • Using a test runner that runs quickly and stably (we suggest Cypress, TestCafe, or Capybara).
  • Covering the important user stories in your application.
  • Minimizing unnecessary and overlapping tests (and eliminating obsolete ones).
  • Running as many tests as possible in parallel.

It goes without saying that browser testing needs to be automated, rather than manual, to ship fast and fixed code. Building it from scratch is an option, but one that often fails: talent is rare, resourcing is expensive, and test suite stability tends to degrade. There are a number of external services that will help a team run browser testing that meets the above criteria, including the following:

Crowdtesting is a process by which the software team provides test cases to the vendor, and the vendor provides a large bank of people to manually perform the scenarios. It has a few advantages: it’s easier to set up than your own automation suite, it requires less ongoing maintenance than a home-built suite, and manual testers can sometimes catch bugs that automated tests would miss. However, this approach has several drawbacks.

Because customers pay for each test run, more software shipped correlates directly to more money spent. While manual testers can sometimes catch bugs that automated tests would miss, they will also frequently report false positives or miss other critical bugs, due to the inexactness of a manual review by an untrained/unfamiliar resource. In addition, while the only real maintenance is updating test instructions, it still means that a resource has to be assigned to the task, continually updating and changing the test cases to prevent the test from becoming stale and outdated.

With Machine Learning (ML)-Enabled Record-and-Play, a third-party application adds an additional layer to your own, allowing you to build tests recording you using your software. These tests are intended to be functional through many small changes of your software, by building “models” of the test event, rather than using conventional testing hooks. This reduces test maintenance costs. Because the tests are truly automated (rather than crowdsourced), you wouldn’t have to pay for the cost of running each test.

However, since it is your team developing the tests with the external application, the gap between your team’s understanding of the application and actual user behavior remains. Additionally, the tests have to be redone every time there’s an appreciable change to the product, requiring substantial attention and input from your team. Lastly, since tests all run through the interface, if you decide to leave the service, you take no assets with you—you’re back at square one.

ProdPerfect offers the final type, Autodetection/Autogeneration. Autodetection tooling analyzes user traffic to determine test cases that represent common flows of user behavior, and then Autogeneration automatically produces repeatable test scripts based on those test cases. The process requires no input from you and minimal (for ProdPerfect) human input to finish test validations. Autodetection and Autogeneration work together continually to update and maintain your test suite through each build of your product, allowing for accurate and realistic testing with minimal time and effort. The tests are parallelized, so they run quickly, and the results are automatically and instantly sent to your developers through CI. Also, you get a copy of the test code with each build, allowing you to run it on demand and continue using it if you leave the service.

When using Autodetection/Autogeneration services, your team will still need to test brand new features that do not affect any previous functionality, as they will not have yet been detected from user traffic.

Ensuring all releases in continuous development are fast and fixed isn’t easy, but it’s absolutely necessary. By removing some of the burden of end-to-end browser testing for each release, your team can focus on doing what they do best: developing the best product they can quickly and efficiently.

When Should I Automate Browser Tests for New Features?

“A user interface is like a joke: if you need to explain it, it’s not that good.”

-Zoltan Kollin, UX Designer

Test automation is critical for continuous delivery and provides fast, repeatable, affordable testing; there’s no doubt it’s a must-have when deploying at speed. Customers often ask us about testing for brand new features—when is the right time to introduce automated tests?—so we’ll cover that here.

When testing for functionality at the browser level, we should differentiate between two kinds of testing: new feature testing and regression testing. The former focuses on making sure brand new features are functional and easy to use; the latter focuses on making sure nothing in the current application has broken as teams deploy new builds.

In brief, we recommend manually testing brand new feature sets, and then deploying automated tests to cover these brand new feature sets for regression. Below we expand on why we believe this.

Regression Testing

Regression testing covers the current application functionality when new changes or features are introduced. It is critical because during the deployment of a new feature, all eyes are on that feature; current functionality will have less human attention. Because existing feature sets are fairly stable, there is a clear payoff to investing in automating these tests: they are repeatable and won’t need to change frequently.

But what about for brand new features?

What is a New Feature?

Testing brand new features is a more interesting puzzle. What should be tested? When should that testing be automated?

Before going further, we should make some distinctions in terminology. “New features” come in three flavors:

  1. Changes that do not affect the user interface (e.g.: a backend change)
  2. Changes that affect the user interface for existing workflows (e.g.: a button moves)
  3. Changes that introduce a brand new workflow (e.g.: adding a new product)

For regular maintenance on an application or alterations to functionality that don’t change the workflow for a user, there’s no need to build brand new tests at the browser level: your current browser testing suite already has you covered—that’s what it’s there for.

For changes that impact a current workflow, you will need to update your existing automated tests to reflect these changes. This can be done during feature development or after the feature hits the testing environment and breaks the testing suite.

For changes that introduce brand new products or workflows, no browser-level automation yet exists to test them. These kinds of changes are what we are calling “brand new features.” This automation will need to be introduced, but should be introduced after the new feature goes to production.

UX Testing and Functionality Testing of New Features

For brand new features or major changes to features, a team will need to develop tests that cover multiple angles. Functionality is key—don’t introduce new bugs—so you’ll need to do functionality testing. But in addition, teams need to test the user interface (UI) for ease of use and customer value before deployment—this is user experience (UX) testing.

This kind of testing can really only be done by humans, and shouldn’t be done exclusively by developers or product teams familiar with the product. Familiarity with product perverts one’s capacity to determine usability. Users unfamiliarwith the new feature need to test it to determine if it’s intuitive and delightful, and strong, quantitative metrics need to be used to understand the big picture and avoid interpretation bias by the product team. Services such as Usertesting.com or Vempathy can provide measurable, quantitative user experience feedback from dozens of different dimensions.

The fact that humans are already repeatedly manually testing a brand new feature for UX means that they are by nature also testing the same new features for functionality: if something breaks, they’ll find it. Building automated tests for brand new features is therefore not yet necessary, but there’s also a good reason to specifically wait.

New Feature Functionality Testing: Timing

For any brand new feature, a team should anticipate that it will be making some major tweaks after releasing to production. A disciplined team will not tolerate releasing major bugs with a new product, but should be ready to improve the product as they get user feedback. You should expect new features released into production to change a few times before they stabilize. For this reason, investing heavily in automated testing for the functionality of those features is a move that should be made late in the game, when the new feature has become more stable: otherwise, you’ll waste your investment in building these automated tests, and will simply need to rebuild them multiple times before they are repeatable.

Automated testing pays off when it’s run many times: it’s expensive and difficult to build, so it doesn’t make sense to build automated tests for workflows that will be tested once or twice before the test needs to be rebuilt. Once the new feature is stabilized, then build your automated tests, fold them into your regression testing suite, and move manual testing efforts towards the next set of new features.