From Replay to Memory: A Smarter Way to Rerun Mobile Tests
There is a specific kind of waste that lives inside almost every QA workflow, and nobody talks about it directly. It's not the time spent writing tests or debugging failures. It's the time spent running tests you've already run — the same login flow, the same checkout, the same fund transfer — executed again and again, from scratch, as if the agent learned nothing the last time.
Reruns make up the majority of QA work. Regression suites, nightly smoke tests, pre-release sanity checks, post-hotfix verification. These aren't exploratory exercises — they're repetitions. And yet most mobile test automation runs them as if every execution is the first one.
That's the problem the Quash rerun engine is built to fix.
Why Mobile Test Automation Gets Worse Over Time
Here's something counterintuitive about most test automation: the more you use it, the more expensive it is to maintain.
The reason is structural. Traditional mobile test automation like Appium, Espresso, Selenium works by coupling test execution to implementation details. Your test knows that the "Pay Now" button has accessibility ID com.app.payment.btn_confirm . The moment someone renames it, adjusts its position, or wraps it in a new component, the test breaks — not because the feature broke, but because the map no longer matches the territory. This is one of the biggest causes of flaky tests in mobile apps: selectors that pointed to something real last sprint and point to nothing today.
This is the maintenance tax. You invest in building coverage, then pay ongoing interest in constant upkeep. Triage which failures are real regressions versus selector breakage. Fix the broken selectors. Repeat after the next sprint. Over a year, a meaningful fraction of your automation team's time isn't building new coverage — it's keeping existing tests alive. Reducing flaky tests caused by UI drift is often more valuable than simply adding more test coverage.
The industry's first answer to this was record-and-replay. The promise was appealing: watch a human interact with the app, record every tap and scroll, replay on demand. No selectors to write. No scripts to maintain.
But record-and-replay captures the how, not the what. It remembers that you tapped a button at specific coordinates with a specific accessibility ID. It doesn't understand that what you were trying to do was submit a payment. When the button moves — even due to a minor layout reflow — the replay fails because it's searching for something at the exact location it memorised, not for the thing you were actually trying to accomplish. You've replaced manually written selectors with automatically captured ones. The brittleness doesn't go away; the recording just obscures it. Flaky tests don't disappear — they just become harder to trace back to a root cause.

Get the Mobile Testing Playbook Used by 800+ QA Teams
Discover 50+ battle-tested strategies to catch critical bugs before production and ship 5-star apps faster.
Learning Instead of Recording
The Quash execution agent, Mahoraga, takes a different approach from the start. Tests are written in plain English. There are no selectors, no recorded sequences, no implementation coupling. When Quash runs a test for the first time, it reads the live screen, reasons about what elements are present, decides on actions, manages interruptions, and builds toward the goal through genuine exploration.
But during that first run, it's also doing something replay never does: building a structured understanding of the flow that generalises rather than memorises. Not coordinates. Not selector paths. An understanding of what sequence of screens constitutes the route, what each screen transition looks like, what the critical validation points are, and what the expected end state should be.
After a successful first run, that model is stored — the successful path through the flow, the meaningful screen transitions, the key decision points, the expected element contexts, the outcome criteria. The understanding of the flow, not a brittle recording of it.
On the second run, and every run after that, Quash doesn't reason from scratch. It knows this flow. A flow that required five minutes of full AI-heavy reasoning on its first run can complete significantly faster on repeat runs. Not because steps are skipped, but because the decision overhead is gone. The agent isn't deliberating over what to do next. It already knows.
AI Test Automation That Knows When to Stop Thinking
This is the part that matters most, and it's worth being direct about it.
Most AI test automation treats every run the same. Every step on every execution goes through the full LLM inference stack. This is expensive, it's slow, and for stable flows it's completely unnecessary. Forcing the agent to rethink a checkout flow it has successfully navigated twenty times is not intelligence — it's waste.
Quash approaches AI test automation differently: the first run is exploratory and AI-heavy because it has to be. Later runs are memory-driven and lightweight because they can be. The agent only switches back into reasoning mode when something actually warrants it. This makes Quash a form of adaptive test automation — its execution behaviour changes based on what the current screen state actually requires, not on a fixed strategy applied uniformly to every run.
On each step of a rerun, Quash checks the current screen state against what it expects. As long as what it sees is consistent with the learned model, it executes from memory — fast, lightweight, no LLM overhead per micro-decision. When what it sees diverges — a new element, an unexpected screen, an interrupting modal — it drops back into reasoning mode, handles the situation, continues, and updates its learned model.
The real innovation isn't AI everywhere. It's knowing when AI is unnecessary. Reducing inference calls on stable repeat flows brings down execution latency, cuts cost, and makes high-frequency runs like nightly builds and PR-triggered CI pipelines practically viable in a way that full-reasoning-every-time execution never is.
Handling Interruptions Without Breaking
One of the most common sources of flaky tests in mobile is the small UI change that breaks an otherwise valid test. A button label changes from "Proceed" to "Continue." A confirmation dialog gains a third option. A promotional modal appears mid-flow that wasn't there yesterday.
Traditional automation fails these cases because it was written against a specific state of the UI. Quash handles them through adaptive execution. When the agent encounters a change — a renamed element, a new interruption, a shifted layout — it doesn't immediately fail. It reasons through what's in front of it, identifies the element or path that satisfies the intent of the current step, and continues. It can dismiss unexpected notifications, handle permission requests, work around promotional modals, and recover from minor layout changes without human intervention.
This is adaptive test automation in practice — resilient because it understands intent, not because it was pre-programmed with a list of known failure modes. For mobile apps specifically, this matters more than anywhere else. Notification banners appear mid-flow. System permission dialogs interrupt unpredictably. OS updates change the appearance of native components. Different device sizes produce layout reflows. A test that runs cleanly on one device may hit a slightly different modal on another with manufacturer UI customisations. Rigid automation breaks on this variance. Quash is built for it.
The Human Analogy Is Actually the Point
An experienced QA engineer who has tested the same payment flow a hundred times doesn't approach run 101 as if they've never seen the app. They know the route. When a new rating prompt appears mid-flow, they dismiss it without breaking stride. They only stop to think when something genuinely requires thinking.
Traditional automation has no equivalent of this. Every execution is run 1. Record-and-replay remembers the exact path but has no judgement about when to deviate from it. The Quash rerun engine gives Mahoraga something closer to the knowledge accumulation of a human tester — a growing understanding of the application that deepens with every run, paired with the ability to reason carefully when something actually changes. Not AI everywhere all the time. AI when it's needed, memory when it's enough.
The economics compound over time. A regression suite in mobile test automation that gets faster with each run, costs less to execute, and requires less maintenance to stay reliable changes what's practical to automate. Stable flows become cheaper, not more expensive. Tests get smarter with every execution. The team spends its time on coverage, not upkeep.
That's the direction QA economics should move. And it's what the Quash rerun engine is building toward.
FAQ
Is this just another record-and-replay system? No. Record-and-replay tools memorise exact interactions — coordinates, selectors, screen positions. Quash stores the intent of the flow, the screen transitions, expected states, and the successful path. That means it can adapt when the UI changes instead of failing immediately.
Does the agent still use AI during reruns? Yes, but only when necessary. Once a flow has been learned, reruns are mostly memory-driven and much faster. If Quash detects a UI change, interruption, modal, or unexpected state, it switches back into reasoning mode.
Why are reruns faster than the first run? The first run is exploratory. The agent is learning the app, understanding the screens, identifying the path, and building execution memory. On later runs, it already knows the route, so it does not need to reason through every step again.
What kinds of UI changes can the rerun engine handle? The Quash rerun engine can handle small layout shifts, renamed buttons, new popups, notification interruptions, permission dialogs, loading differences, and minor screen structure changes.
Does this reduce the cost of running AI-powered tests? Yes. Since Quash is not invoking full LLM reasoning at every step during reruns, execution becomes faster and inference costs become lower over time. This is what makes high-frequency mobile test automation sustainable at scale.
Why is this especially useful for mobile apps? Mobile apps are full of interruptions and inconsistencies — notifications, permission prompts, OS-level dialogs, different screen sizes, and device-specific layouts. A memory-driven rerun engine is far more resilient in these environments than rigid automation.
How does this help reduce flaky tests? Most flaky tests happen because the UI changes slightly, a modal interrupts the flow, a notification appears, or a selector no longer matches exactly. Since Quash understands the intent of the flow rather than exact coordinates or selectors, it can recover from many of these situations instead of failing immediately.



