The State of Test Automation Maintenance: What QA Engineers Actually Say

- TL;DR
- Methodology
- Finding 1: Maintenance Is the Killer
- Finding 2: Locator Brittleness Is the Acute Mobile Pain
- Finding 3: Flaky Tests Break Trust Before They Break Pipelines
- Finding 4: AI Is Treated Like a Junior QA, Not a Replacement
- Finding 5: ROI Is Confidence, Not Just Hours Saved
- What Engineering Leaders Should Take From This
- What QA Engineers Should Take From This
- Where Mobile Test Automation Needs to Go Next
- A Transparent Note From Quash
- Conclusion: Maintenance Is the Real Automation Test
Test automation was supposed to make QA teams faster.
In practice, many QA engineers describe a messier tradeoff: automation gives teams coverage, repeatability, and release confidence, but it also creates another system that has to be maintained.
Scripts need updates. Locators drift. Framework versions change. CI failures need triage. Test data expires. A flaky test gets rerun until it becomes background noise.
That is the real cost of test automation maintenance.
This report is based on qualitative voice-of-customer research across five organic public QA community discussions. The goal is not to present a statistically representative survey. The goal is to capture the language, complaints, objections, and priorities QA engineers keep repeating when they talk about test maintenance, flaky tests, AI testing tools, and mobile test automation.
The clearest finding is simple:
QA engineers do not hate automation. They hate maintaining automation that breaks for reasons users never care about.
TL;DR
Test automation maintenance is the real ceiling. Teams do not struggle only with writing tests. They struggle with keeping tests useful after the app changes.
Locator brittleness is the sharpest mobile pain. IDs, XPath, accessibility labels, waits, and device differences create constant upkeep.
Flaky tests destroy trust. Once teams stop believing red builds, automation loses authority.
AI is welcomed as assistance, not replacement. QA engineers describe useful AI as a draft, a helper, or a junior QA that still needs review.
ROI is confidence, not just hours saved. The best automation gives teams confidence to release, not just a bigger test count.


Get the Mobile Testing Playbook Used by 800+ QA Teams
Discover 50+ battle-tested strategies to catch critical bugs before production and ship 5-star apps faster.
Methodology
We reviewed five organic public QA community discussions focused on test automation, AI-assisted QA, locator brittleness, flaky tests, and the day-to-day reality of maintaining automated tests.
The reviewed discussions covered:
QA tools that are useful day to day
AI for test case creation
AI tools that help write and run automated UI tests
Automation feeling like another full-time job
Building AI-assisted testing workflows with coding agents
Each discussion was coded for:
Repeated pain points
Specific practitioner language
Objections to automation tooling
Objections to AI testing tools
Mobile test automation maintenance signals
Themes around QA ownership, judgment, and trust
A note on quotes: the quote wall below uses short public-community quotes or lightly cleaned fragments from the reviewed discussions. Usernames are intentionally omitted. The quotes are not private interviews, and they should not be treated as survey responses.
This is best read as a qualitative field report: what QA engineers say when they are talking to each other, not filling out a vendor form.
Finding 1: Maintenance Is the Killer
The strongest recurring signal was not “we need more automation.”
It was: maintaining automation becomes the problem.
One thread captured the wound directly:
“Maintenance is the number one killer of automation.”
Another described the day-to-day pain:
“One locator change and you're fixing tests for hours.”
And another variation hit the framework side:
“New framework version? Half your pipeline breaks.”
That is the core tension in test automation maintenance. The first version of a test suite is not the real test. The real test comes after the product changes.
A new onboarding screen appears. A button label changes. A permission dialog behaves differently on Android. A payment flow gets a new loading state. The test fails, but the product still works.
Now QA has to answer the question every automation team eventually faces:
Did the product break, or did the test break?
That investigation is the hidden labor of automation.
The Maintenance Loop

A brittle suite creates a loop:
App changes
Locator, data, or environment drifts
Test fails
QA investigates
Test is updated, quarantined, or ignored
Trust either recovers or erodes
That loop is normal in small amounts. But when the loop becomes constant, automation stops feeling like leverage. It starts feeling like another product the QA team has to maintain.
This is why test maintenance should not be treated as an afterthought. It is the real long-term cost of automation.
For teams already dealing with inconsistent failures, the issue is often bigger than one flaky test. It is the full maintenance surface around the suite: selectors, waits, device state, test data, backend dependencies, CI stability, ownership, and triage discipline.
Finding 2: Locator Brittleness Is the Acute Mobile Pain
Locator brittleness was the most specific technical pain in the research.
Traditional UI automation depends on implementation-level references: resource IDs, accessibility labels, XPath, text selectors, class names, or UI hierarchy. This works when the UI is stable and the engineering team consistently maintains test-friendly identifiers.
Mobile apps rarely stay that clean.
Buttons move. Labels change. Components get refactored. Native dialogs interrupt flows. Android and iOS expose different automation surfaces. Device sizes vary. Loading states appear at slightly different times. Accessibility identifiers are missing, inconsistent, or not treated as product-critical.
One recurring complaint was simple:
“developers are not maintaining ids leading to breaking tests.”
Another practitioner pushed back with an experienced counterpoint:
“why is a locator change taking hours? skill issue.”
That pushback matters. Skilled QA engineers are right that locator brittleness can be reduced with better discipline: stable test IDs, Page Object Models, reusable selectors, strong waits, and developer-QA coordination.
So the honest conclusion is not: “locators are impossible.”
The honest conclusion is:
Locator-based automation demands permanent discipline.
For well-staffed teams with mature automation engineers, that may be acceptable. For small QA teams shipping mobile changes every sprint, it becomes a serious maintenance tax.
Why Mobile Makes Locator Maintenance Worse

Mobile test automation adds extra instability because tests must deal with:
Native permission dialogs
Keyboards
Gestures
Device state
Different screen sizes
Android and iOS behavior differences
OS version differences
Slow or inconsistent network conditions
Real-device quirks
App backgrounding and foregrounding
That is why locator maintenance is not just a test-code issue. It becomes a release-confidence issue.
If you are using Appium, locator strategy deserves deliberate planning. The upcoming guide on Appium iOS vs Android locators should cover this in more depth. Do not publish this internal link until that page is live. Until then, link to the Appium mobile testing guide or Appium alternatives.
Finding 3: Flaky Tests Break Trust Before They Break Pipelines
A flaky test is usually defined as a test that passes and fails inconsistently without a relevant product change.
That definition is technically correct, but it undersells the real damage.
A flaky test does not only waste time. It trains the team to distrust automation.
The first time a test fails randomly, someone reruns it. The fifth time, people start ignoring it. Eventually, a red build no longer means “something broke.” It means “probably CI again.”
That is the moment automation loses authority.
A good automation suite should create a trusted signal. When it fails, the team should care. When it passes, the team should have more confidence in the release.
Flaky tests destroy both sides of that equation.
How Flakiness Erodes Automation Value
Stage | What happens | Team behavior |
First flaky failure | A test fails inconsistently | Someone reruns it |
Repeated flakiness | The same test keeps failing randomly | Team starts discounting the signal |
Suite-level flakiness | Multiple tests fail for unclear reasons | CI loses authority |
Hidden regression | A real issue appears among noisy failures | The team responds late |
Lost confidence | Automation becomes unreliable | Manual QA pressure returns |
This is why flaky test work is not cleanup. It is trust repair.
A flaky suite says: “We have automation, but we do not fully believe it.”
That is a dangerous place to be. The team still pays the cost of maintaining automation, but the business no longer gets dependable release confidence from it.
For a deeper diagnosis workflow, use the flaky tests guide.
Finding 4: AI Is Treated Like a Junior QA, Not a Replacement
QA engineers are not universally anti-AI.
The research showed a more practical view: QA teams are open to AI when it helps with drafts, repetitive execution, summarization, debugging, or regression support.
But they reject the idea that AI can fully replace QA judgment.
The cleanest practitioner framing was:
“treat AI output like junior QA output, review it.”
Another repeated idea was:
“AI is a draft, not a replacement for thinking.”
That is the right model.
A junior QA can be useful. They can draft test cases. They can execute flows. They can notice issues. But they need context, review, and guidance from someone more experienced.
AI testing tools should be framed the same way.
The AI Model QA Engineers Actually Accept

QA engineers are more likely to accept AI for:
Drafting test ideas
Generating first-pass test cases
Summarizing failures
Running repetitive regression
Identifying changed screens
Helping with test maintenance triage
They are less likely to accept AI for:
Final release judgment
Exploratory testing strategy
Business-risk prioritization
Context-heavy edge cases
Replacing QA headcount
Fully autonomous PRD-to-production testing
The distinction is not small. “AI replaces QA” is radioactive because the buyer and champion is often the QA engineer you are insulting.
The better message is:
AI should remove drudgery so QA can spend more time on judgment.
That is the lane practitioners are actually open to.
Finding 5: ROI Is Confidence, Not Just Hours Saved
A lot of test automation ROI content starts with hours saved.
That is not wrong, but it is incomplete.
Yes, automation can reduce repeated manual regression work. Yes, it can speed up feedback. Yes, it can reduce repetitive testing effort.
But the strongest practitioner framing was about confidence:
“Automation ROI isn't about hours saved, it's about confidence gained.”
Another phrasing made the same point:
“the benefit isn't stopwatch savings, it's system trust.”
This is the better frame for test automation maintenance.
The real value of automation is not that it lets you say “we automated 500 tests.” The value is that your team can release knowing the highest-risk flows still work.
The Better ROI Frame

Weak ROI frame | Stronger ROI frame |
“We saved tester hours” | “We know critical flows still work” |
“We automated 500 cases” | “We covered the highest-risk regression paths” |
“We reduced manual effort” | “We reduced release uncertainty” |
“We run tests faster” | “We can trust failures when they happen” |
“We increased coverage” | “We increased confidence in the release” |
This matters even more in mobile.
A mobile app can fail in ways that are highly visible and expensive: broken login, failed OTP, checkout bugs, payment failures, location issues, notification failures, permission dead ends, or device-specific layout problems.
Nobody cares that QA saved time if the wrong bug escapes.
That is why mature teams should measure automation ROI through confidence signals:
Critical flows covered
Flakiness rate reduced
False failures reduced
Failure diagnosis time reduced
Escaped defects reduced
Release blockers caught earlier
Regression cycles completed reliably
Failures backed by clear evidence
For the business case side, see the test automation ROI calculator.
What Engineering Leaders Should Take From This
The mistake is buying automation as if test creation is the whole problem.
It is not.
Test creation is the easy demo. Test maintenance is the long-term proof.
Before scaling any automation platform, ask:
What happens when the UI changes?
Who owns test maintenance?
How often do tests fail for non-product reasons?
Can failures be diagnosed quickly?
Does the suite depend on fragile locators?
Are we automating high-risk flows or chasing vanity coverage?
Does the tool reduce maintenance, or create a new kind of maintenance?
Will QA engineers still trust this system three months from now?
That last question is the real one.
A tool that looks impressive in week one but creates maintenance drag by month three is not solving the real problem. It is moving the problem.
What QA Engineers Should Take From This
The practitioner consensus is not “automation is bad.”
It is sharper than that:
Automate repetitive regression where it clearly reduces pain.
Do not automate low-value flows just to increase test count.
Track flaky tests as trust risks, not minor annoyances.
Push for stable testability hooks if using locator-based frameworks.
Treat AI output as a draft.
Keep humans responsible for risk, judgment, and exploratory coverage.
Measure automation by release confidence, not just execution speed.
The best QA teams are not anti-automation. They are anti-waste.
They want automation that removes boring work, catches real regressions, and makes releases safer.
They do not want another brittle system that breaks every sprint and then gets blamed on QA.
Where Mobile Test Automation Needs to Go Next
Mobile testing is where these maintenance problems become most visible.
Mobile apps have more moving parts: devices, emulators, OS versions, gestures, keyboards, permissions, network states, app backgrounding, backend dependencies, and fast-changing UI.
That makes mobile test automation valuable, but also fragile when it is built on brittle layers.
The next generation of mobile automation needs to reduce the maintenance burden, not just create tests faster.
That means:
Less dependence on implementation-level selectors
Better handling of dynamic screens and app states
Clear failure evidence with screenshots, logs, and step context
Tests grounded in actual app behavior
AI assistance that stays reviewable
Workflows QA teams can understand and control
Regression automation that improves confidence without bloating the suite
This is also why tooling language needs to change.
QA engineers are tired of vague claims like “AI-powered,” “self-healing,” and “fully autonomous.” They want to know what actually happens when the app changes.
Does the test survive?
Does the tool explain what failed?
Does QA stay in control?
Does it reduce maintenance, or does it create another maintenance layer?
Those are the questions that matter.
A Transparent Note From Quash
Quash is our product, so we have a point of view here.
We built Quash around many of the same patterns this research surfaced: mobile tests should be easier to create, less tied to brittle locators, grounded in real app behavior, and useful to QA teams rather than positioned as a replacement for them.
That does not mean every team should drop its current framework. Appium, native frameworks, and scripted automation still make sense for teams with strong automation engineering support.
But if your mobile automation suite keeps breaking because the UI changes, locator maintenance is eating QA time, or flaky tests have made CI hard to trust, it may be time to rethink the maintenance model instead of adding more scripts.
You can explore Quash’s mobile test execution workflow if that problem sounds familiar.
Conclusion: Maintenance Is the Real Automation Test
The first test is not the real test of an automation strategy.
The real test comes later.
It comes after the UI changes. After the locator disappears. After the framework updates. After CI fails randomly. After the team has ignored enough flaky tests that a red build no longer creates urgency.
That is when you find out whether automation is giving the team confidence or quietly creating more work.
QA engineers are not asking for magic. They are asking for automation that respects reality:
Apps change.
Mobile is messy.
Locators drift.
AI needs review.
Flaky tests destroy trust.
Humans still make the hard quality calls.
ROI means confidence, not just speed.
The teams that understand this will build smaller, stronger, more trusted automation suites.
The teams that ignore it will keep adding tests to a system nobody fully believes.
And that is the real state of test automation maintenance.







