The State of Test Automation Maintenance: What QA Engineers Actually Say

Ameer Hamza
Ameer Hamza
|Published on |10 min
Cover Image for The State of Test Automation Maintenance: What QA Engineers Actually Say

Test automation was supposed to make QA teams faster.

In practice, many QA engineers describe a messier tradeoff: automation gives teams coverage, repeatability, and release confidence, but it also creates another system that has to be maintained.

Scripts need updates. Locators drift. Framework versions change. CI failures need triage. Test data expires. A flaky test gets rerun until it becomes background noise.

That is the real cost of test automation maintenance.

This report is based on qualitative voice-of-customer research across five organic public QA community discussions. The goal is not to present a statistically representative survey. The goal is to capture the language, complaints, objections, and priorities QA engineers keep repeating when they talk about test maintenance, flaky tests, AI testing tools, and mobile test automation.

The clearest finding is simple:

QA engineers do not hate automation. They hate maintaining automation that breaks for reasons users never care about.


TL;DR

  • Test automation maintenance is the real ceiling. Teams do not struggle only with writing tests. They struggle with keeping tests useful after the app changes.

  • Locator brittleness is the sharpest mobile pain. IDs, XPath, accessibility labels, waits, and device differences create constant upkeep.

  • Flaky tests destroy trust. Once teams stop believing red builds, automation loses authority.

  • AI is welcomed as assistance, not replacement. QA engineers describe useful AI as a draft, a helper, or a junior QA that still needs review.

  • ROI is confidence, not just hours saved. The best automation gives teams confidence to release, not just a bigger test count.

QA talk about chart

Ebook Preview

Get the Mobile Testing Playbook Used by 800+ QA Teams

Discover 50+ battle-tested strategies to catch critical bugs before production and ship 5-star apps faster.

100% Free. No spam. Unsubscribe anytime.

Methodology

We reviewed five organic public QA community discussions focused on test automation, AI-assisted QA, locator brittleness, flaky tests, and the day-to-day reality of maintaining automated tests.

The reviewed discussions covered:

  • QA tools that are useful day to day

  • AI for test case creation

  • AI tools that help write and run automated UI tests

  • Automation feeling like another full-time job

  • Building AI-assisted testing workflows with coding agents

Each discussion was coded for:

  • Repeated pain points

  • Specific practitioner language

  • Objections to automation tooling

  • Objections to AI testing tools

  • Mobile test automation maintenance signals

  • Themes around QA ownership, judgment, and trust

A note on quotes: the quote wall below uses short public-community quotes or lightly cleaned fragments from the reviewed discussions. Usernames are intentionally omitted. The quotes are not private interviews, and they should not be treated as survey responses.

This is best read as a qualitative field report: what QA engineers say when they are talking to each other, not filling out a vendor form.


Finding 1: Maintenance Is the Killer

The strongest recurring signal was not “we need more automation.”

It was: maintaining automation becomes the problem.

One thread captured the wound directly:

“Maintenance is the number one killer of automation.”

Another described the day-to-day pain:

“One locator change and you're fixing tests for hours.”

And another variation hit the framework side:

“New framework version? Half your pipeline breaks.”

That is the core tension in test automation maintenance. The first version of a test suite is not the real test. The real test comes after the product changes.

A new onboarding screen appears. A button label changes. A permission dialog behaves differently on Android. A payment flow gets a new loading state. The test fails, but the product still works.

Now QA has to answer the question every automation team eventually faces:

Did the product break, or did the test break?

That investigation is the hidden labor of automation.

The Maintenance Loop

test automation maintenance chart

A brittle suite creates a loop:

  1. App changes

  2. Locator, data, or environment drifts

  3. Test fails

  4. QA investigates

  5. Test is updated, quarantined, or ignored

  6. Trust either recovers or erodes

That loop is normal in small amounts. But when the loop becomes constant, automation stops feeling like leverage. It starts feeling like another product the QA team has to maintain.

This is why test maintenance should not be treated as an afterthought. It is the real long-term cost of automation.

For teams already dealing with inconsistent failures, the issue is often bigger than one flaky test. It is the full maintenance surface around the suite: selectors, waits, device state, test data, backend dependencies, CI stability, ownership, and triage discipline.


Finding 2: Locator Brittleness Is the Acute Mobile Pain

Locator brittleness was the most specific technical pain in the research.

Traditional UI automation depends on implementation-level references: resource IDs, accessibility labels, XPath, text selectors, class names, or UI hierarchy. This works when the UI is stable and the engineering team consistently maintains test-friendly identifiers.

Mobile apps rarely stay that clean.

Buttons move. Labels change. Components get refactored. Native dialogs interrupt flows. Android and iOS expose different automation surfaces. Device sizes vary. Loading states appear at slightly different times. Accessibility identifiers are missing, inconsistent, or not treated as product-critical.

One recurring complaint was simple:

“developers are not maintaining ids leading to breaking tests.”

Another practitioner pushed back with an experienced counterpoint:

“why is a locator change taking hours? skill issue.”

That pushback matters. Skilled QA engineers are right that locator brittleness can be reduced with better discipline: stable test IDs, Page Object Models, reusable selectors, strong waits, and developer-QA coordination.

So the honest conclusion is not: “locators are impossible.”

The honest conclusion is:

Locator-based automation demands permanent discipline.

For well-staffed teams with mature automation engineers, that may be acceptable. For small QA teams shipping mobile changes every sprint, it becomes a serious maintenance tax.

Why Mobile Makes Locator Maintenance Worse

locators become an acute maintenance issue

Mobile test automation adds extra instability because tests must deal with:

  • Native permission dialogs

  • Keyboards

  • Gestures

  • Device state

  • Different screen sizes

  • Android and iOS behavior differences

  • OS version differences

  • Slow or inconsistent network conditions

  • Real-device quirks

  • App backgrounding and foregrounding

That is why locator maintenance is not just a test-code issue. It becomes a release-confidence issue.

If you are using Appium, locator strategy deserves deliberate planning. The upcoming guide on Appium iOS vs Android locators should cover this in more depth. Do not publish this internal link until that page is live. Until then, link to the Appium mobile testing guide or Appium alternatives.


Finding 3: Flaky Tests Break Trust Before They Break Pipelines

A flaky test is usually defined as a test that passes and fails inconsistently without a relevant product change.

That definition is technically correct, but it undersells the real damage.

A flaky test does not only waste time. It trains the team to distrust automation.

The first time a test fails randomly, someone reruns it. The fifth time, people start ignoring it. Eventually, a red build no longer means “something broke.” It means “probably CI again.”

That is the moment automation loses authority.

A good automation suite should create a trusted signal. When it fails, the team should care. When it passes, the team should have more confidence in the release.

Flaky tests destroy both sides of that equation.

How Flakiness Erodes Automation Value

Stage

What happens

Team behavior

First flaky failure

A test fails inconsistently

Someone reruns it

Repeated flakiness

The same test keeps failing randomly

Team starts discounting the signal

Suite-level flakiness

Multiple tests fail for unclear reasons

CI loses authority

Hidden regression

A real issue appears among noisy failures

The team responds late

Lost confidence

Automation becomes unreliable

Manual QA pressure returns

This is why flaky test work is not cleanup. It is trust repair.

A flaky suite says: “We have automation, but we do not fully believe it.”

That is a dangerous place to be. The team still pays the cost of maintaining automation, but the business no longer gets dependable release confidence from it.

For a deeper diagnosis workflow, use the flaky tests guide.


Finding 4: AI Is Treated Like a Junior QA, Not a Replacement

QA engineers are not universally anti-AI.

The research showed a more practical view: QA teams are open to AI when it helps with drafts, repetitive execution, summarization, debugging, or regression support.

But they reject the idea that AI can fully replace QA judgment.

The cleanest practitioner framing was:

“treat AI output like junior QA output, review it.”

Another repeated idea was:

“AI is a draft, not a replacement for thinking.”

That is the right model.

A junior QA can be useful. They can draft test cases. They can execute flows. They can notice issues. But they need context, review, and guidance from someone more experienced.

AI testing tools should be framed the same way.

The AI Model QA Engineers Actually Accept

ai model qa actually adopt

QA engineers are more likely to accept AI for:

  • Drafting test ideas

  • Generating first-pass test cases

  • Summarizing failures

  • Running repetitive regression

  • Identifying changed screens

  • Helping with test maintenance triage

They are less likely to accept AI for:

  • Final release judgment

  • Exploratory testing strategy

  • Business-risk prioritization

  • Context-heavy edge cases

  • Replacing QA headcount

  • Fully autonomous PRD-to-production testing

The distinction is not small. “AI replaces QA” is radioactive because the buyer and champion is often the QA engineer you are insulting.

The better message is:

AI should remove drudgery so QA can spend more time on judgment.

That is the lane practitioners are actually open to.


Finding 5: ROI Is Confidence, Not Just Hours Saved

A lot of test automation ROI content starts with hours saved.

That is not wrong, but it is incomplete.

Yes, automation can reduce repeated manual regression work. Yes, it can speed up feedback. Yes, it can reduce repetitive testing effort.

But the strongest practitioner framing was about confidence:

“Automation ROI isn't about hours saved, it's about confidence gained.”

Another phrasing made the same point:

“the benefit isn't stopwatch savings, it's system trust.”

This is the better frame for test automation maintenance.

The real value of automation is not that it lets you say “we automated 500 tests.” The value is that your team can release knowing the highest-risk flows still work.

The Better ROI Frame

better ROI frame

Weak ROI frame

Stronger ROI frame

“We saved tester hours”

“We know critical flows still work”

“We automated 500 cases”

“We covered the highest-risk regression paths”

“We reduced manual effort”

“We reduced release uncertainty”

“We run tests faster”

“We can trust failures when they happen”

“We increased coverage”

“We increased confidence in the release”

This matters even more in mobile.

A mobile app can fail in ways that are highly visible and expensive: broken login, failed OTP, checkout bugs, payment failures, location issues, notification failures, permission dead ends, or device-specific layout problems.

Nobody cares that QA saved time if the wrong bug escapes.

That is why mature teams should measure automation ROI through confidence signals:

  • Critical flows covered

  • Flakiness rate reduced

  • False failures reduced

  • Failure diagnosis time reduced

  • Escaped defects reduced

  • Release blockers caught earlier

  • Regression cycles completed reliably

  • Failures backed by clear evidence

For the business case side, see the test automation ROI calculator.


What Engineering Leaders Should Take From This

The mistake is buying automation as if test creation is the whole problem.

It is not.

Test creation is the easy demo. Test maintenance is the long-term proof.

Before scaling any automation platform, ask:

  1. What happens when the UI changes?

  2. Who owns test maintenance?

  3. How often do tests fail for non-product reasons?

  4. Can failures be diagnosed quickly?

  5. Does the suite depend on fragile locators?

  6. Are we automating high-risk flows or chasing vanity coverage?

  7. Does the tool reduce maintenance, or create a new kind of maintenance?

  8. Will QA engineers still trust this system three months from now?

That last question is the real one.

A tool that looks impressive in week one but creates maintenance drag by month three is not solving the real problem. It is moving the problem.


What QA Engineers Should Take From This

The practitioner consensus is not “automation is bad.”

It is sharper than that:

  • Automate repetitive regression where it clearly reduces pain.

  • Do not automate low-value flows just to increase test count.

  • Track flaky tests as trust risks, not minor annoyances.

  • Push for stable testability hooks if using locator-based frameworks.

  • Treat AI output as a draft.

  • Keep humans responsible for risk, judgment, and exploratory coverage.

  • Measure automation by release confidence, not just execution speed.

The best QA teams are not anti-automation. They are anti-waste.

They want automation that removes boring work, catches real regressions, and makes releases safer.

They do not want another brittle system that breaks every sprint and then gets blamed on QA.


Where Mobile Test Automation Needs to Go Next

Mobile testing is where these maintenance problems become most visible.

Mobile apps have more moving parts: devices, emulators, OS versions, gestures, keyboards, permissions, network states, app backgrounding, backend dependencies, and fast-changing UI.

That makes mobile test automation valuable, but also fragile when it is built on brittle layers.

The next generation of mobile automation needs to reduce the maintenance burden, not just create tests faster.

That means:

  • Less dependence on implementation-level selectors

  • Better handling of dynamic screens and app states

  • Clear failure evidence with screenshots, logs, and step context

  • Tests grounded in actual app behavior

  • AI assistance that stays reviewable

  • Workflows QA teams can understand and control

  • Regression automation that improves confidence without bloating the suite

This is also why tooling language needs to change.

QA engineers are tired of vague claims like “AI-powered,” “self-healing,” and “fully autonomous.” They want to know what actually happens when the app changes.

Does the test survive?

Does the tool explain what failed?

Does QA stay in control?

Does it reduce maintenance, or does it create another maintenance layer?

Those are the questions that matter.


A Transparent Note From Quash

Quash is our product, so we have a point of view here.

We built Quash around many of the same patterns this research surfaced: mobile tests should be easier to create, less tied to brittle locators, grounded in real app behavior, and useful to QA teams rather than positioned as a replacement for them.

That does not mean every team should drop its current framework. Appium, native frameworks, and scripted automation still make sense for teams with strong automation engineering support.

But if your mobile automation suite keeps breaking because the UI changes, locator maintenance is eating QA time, or flaky tests have made CI hard to trust, it may be time to rethink the maintenance model instead of adding more scripts.

You can explore Quash’s mobile test execution workflow if that problem sounds familiar.


Conclusion: Maintenance Is the Real Automation Test

The first test is not the real test of an automation strategy.

The real test comes later.

It comes after the UI changes. After the locator disappears. After the framework updates. After CI fails randomly. After the team has ignored enough flaky tests that a red build no longer creates urgency.

That is when you find out whether automation is giving the team confidence or quietly creating more work.

QA engineers are not asking for magic. They are asking for automation that respects reality:

  • Apps change.

  • Mobile is messy.

  • Locators drift.

  • AI needs review.

  • Flaky tests destroy trust.

  • Humans still make the hard quality calls.

  • ROI means confidence, not just speed.

The teams that understand this will build smaller, stronger, more trusted automation suites.

The teams that ignore it will keep adding tests to a system nobody fully believes.

And that is the real state of test automation maintenance.