Quash for Windows is here.Download now

Published on

|

5 mins

mahima
mahima
Cover Image for From Scripts to Intent: What Changed in Mahoraga V2

From Scripts to Intent: What Changed in Mahoraga V2

"Scriptless testing" has been a promise in the mobile QA space for years. It's also been, for most of that time, a slightly misleading one.

The tools that called themselves scriptless usually still required someone to configure element selectors, map accessibility IDs, or write rule-based flows that were, in practice, a different syntax for the same underlying problem. The tests still broke when the UI changed. The maintenance burden didn't go away — it just moved somewhere less visible.

Mahoraga V2 is different in a specific, technical sense that's worth explaining clearly. This post is that explanation.

What "Scriptless Testing" Usually Means — And Why It Often Doesn't Deliver

The appeal of scriptless test automation is obvious: remove the engineering overhead from QA, let non-engineers write tests, eliminate the selector maintenance cycle that consumes automation teams. The idea is right. Most implementations fall short because they solve the surface problem without addressing the underlying one.

The surface problem is the scripting syntax. The underlying problem is coupling. Script-based test automation is brittle because tests are tightly coupled to implementation details — the element ID of a button, the XPath of a form field, the class hierarchy of a navigation component. When the implementation changes, the coupling breaks. Rename a button, update a layout, refactor a navigation stack, and tests that were passing stop passing — not because the feature broke, but because the test was written against an attribute of the implementation that no longer exists.

Most "scriptless" tools replace Appium scripts with a visual recorder or a drag-and-drop flow builder. The underlying coupling is still there. The recorded interactions are still mapped to element identifiers. The maintenance problem is still there. It's just wearing different clothes.

Why Do Most No-Code Testing Tools Still Require Maintenance?

Because recording a user action still captures the how — which element was tapped, which selector was matched — rather than the what. The tool remembers that you tapped a button at coordinates (X, Y) with accessibility ID com.app.login.btn_submit. When the button moves or the ID changes, the recording breaks. True scriptless automation requires capturing intent — what you wanted to accomplish — and letting the execution engine figure out the how at runtime.

Ebook Preview

Get the Mobile Testing Playbook Used by 800+ QA Teams

Discover 50+ battle-tested strategies to catch critical bugs before production and ship 5-star apps faster.

100% Free. No spam. Unsubscribe anytime.

What Scriptless Actually Means in Quash: Intent-Driven Execution

In Quash, a test case is a plain English instruction. "Open the app, log in with a valid email and password, navigate to the cart, add the first available item, proceed to checkout, verify the order summary shows the correct item and price."

That's the test. There is no element selector. There is no XPath. There is no step-level implementation detail. The instruction expresses intent: what should happen and what should be verified. Mahoraga, running on the connected device, figures out how to execute it.

This is not a UI recorder that happens to generate English-language labels. Mahoraga reads the live screen — using Android's accessibility APIs and a vision-based interpretation layer — identifies what's there, infers what the next action should be to satisfy the stated intent, and executes it. When the button moves, when the label changes, when a new screen is added to the flow, the plain English instruction stays valid because it was never coupled to the attribute that changed.

The departure from Appium and Espresso is architectural, not cosmetic. Quash tests are written in the language of intent. Appium and Espresso tests are written in the language of implementation. That's the difference that matters.

What Changed Between V1 and Mahoraga V2

The V1 execution engine demonstrated that intent-based execution was possible. V2 makes it reliable enough to be the foundation of a QA process.

How Fast Is Mahoraga V2 Compared to V1?

Execution speed is the single most impactful change. V1 processed individual steps in 40–50 seconds each — a 10-step test took seven to ten minutes. V2 brings that to approximately 7 seconds per step. A 10-step test now runs in roughly a minute. A regression suite of 50 test cases, run in parallel across a device fleet, completes in under 10 minutes. The speed difference is the difference between a tool you use in CI on every PR and one you run nightly if someone remembers.

How Does Mahoraga Maintain Context Across a Multi-Step Flow?

Unlike stateless automation runners that treat each step in isolation, Mahoraga carries full session context throughout execution. Every prior action, every prior screen state, and the original instruction are all available at each decision point — this is the episodic memory layer that makes long, branching flows work reliably.

In practice, this means Mahoraga doesn't lose track of where it is in a flow if a loading screen appears, a redirect happens, or a permission dialog interrupts mid-journey. It knows what it was trying to accomplish and resumes from the right point. For apps with deep navigation trees, conditional logic, or flows that span 10+ screens, this persistent context is what separates a reliable run from a hallucinated one.

Megumi — the test generation agent — benefits from this same always-on context at the generation stage. When you're building out a test suite conversationally, Megumi remembers the full session: what you've already generated, what feedback you've given, what your app does. Follow-up prompts refine without restarting. The context compounds rather than resetting.

Improved planning logic beyond the two modes: V2's core planning layer is meaningfully better at handling ambiguous screen states — moments where the app is mid-animation, where a loading indicator is present, where a previous action hasn't fully resolved. V1 would sometimes proceed before the screen had settled into its next state. V2 is better at reading whether the screen is ready for the next action before taking it.

Reduced hallucination rate. In V1, Mahoraga would occasionally decide, with apparent confidence, to take an action that bore no relationship to the test instruction. V2's improved model pipeline significantly reduces these cases. They're not eliminated — re-run determinism is still evolving — but they're rare enough that flakiness detection and retry logic handles most remaining variance.

How Mahoraga Executes a Test: What's Actually Happening

Without exposing the full technical architecture, the execution loop works like this.

A plain English instruction enters the system. Mahoraga interprets it to extract the sequence of intents — what needs to happen, what constitutes completion, what should be verified. On the device, Mahoraga reads the current screen state using Android's accessibility APIs: what elements are present, what their visible text is, where they are on screen. It maps the current state against the current intent and decides on an action — tap, scroll, swipe, type, navigate. It executes the action and captures a screenshot. It reads the resulting screen state and decides on the next action.

This loop repeats for each step in the instruction, with the full execution context — every prior action, every prior screen state, the original instruction — available to inform each decision. This is the episodic memory layer that makes multi-step flows work: Mahoraga isn't making stateless decisions at each step, it's carrying the full context of what's happened so far.

When a step fails, Mahoraga distinguishes between transient failures (network timeout, animation still in progress, slow server response) and genuine failures (the expected screen state never appeared, an error message was shown instead). Transient failures trigger retry logic. Genuine failures are logged in the execution report with a screenshot of the actual state versus the expected state.

What This Means for QA Teams: Lower Maintenance, Higher Coverage Ceiling

The operational implication of intent-based execution at V2 reliability is that the test maintenance cycle that consumes a significant portion of most automation teams' time largely disappears.

With Appium or Espresso, a sprint that includes a UI redesign means a test maintenance sprint. Element identifiers change, selectors break, the suite goes red on failures that aren't regressions. Someone has to triage which failures are real and which are selector breakage. Then someone has to fix the broken selectors before the suite is useful again. That cycle — ship redesign, maintenance sprint, restore suite — repeats with every significant UI change.

With Mahoraga V2, a UI redesign means Mahoraga adapts. If a button moves, Mahoraga finds it by its visible text and context. If a label changes, the self-healing layer identifies the most likely match. If a screen is added to a flow, "navigate to checkout" still succeeds because Mahoraga is solving for the intent, not following a scripted path.

The maintenance work that remains is at the level of intent: if the feature itself changes — not just how it looks, but what it does — the test instruction should be updated. That's appropriate maintenance. Updating a test because a button moved three pixels is not.

For QA leads: automation coverage can expand without proportional maintenance overhead. Adding 50 new test cases doesn't add 50 potential maintenance failures when the next sprint ships UI changes.

For engineering managers: failures in CI are regressions, not selector breakage. The signal-to-noise ratio of the test pipeline goes up.

For CTOs evaluating mobile QA infrastructure: the total cost of ownership for a Quash-based suite is materially lower than a comparable Appium suite, primarily because the maintenance cost that compounds over 12–18 months doesn't exist in the same form.

Where Mahoraga V2 Still Has Work to Do

Re-run determinism is still evolving. Running the same test case twice on the same device and the same app build should produce the same result every time. V2 is significantly more consistent than V1, but full determinism — particularly on flows with external dependencies like network calls or third-party auth — is ongoing work. The retry logic and flakiness detection in the Analytics dashboard exist partly to manage the cases where V2 isn't yet fully deterministic. We track this, and it's a priority.

Physical iOS device support is also not yet in V2 — iOS execution currently covers simulators on Mac. That's the next surface on the roadmap.

The Shift From Script Maintenance to Test Coverage

The real promise of intent-based execution isn't just that it's easier to write tests. It's that the testing ceiling — the amount of coverage a team can realistically maintain — goes up. When every new feature doesn't also mean a test maintenance burden, teams can invest the saved time in coverage instead of upkeep. That compounding effect, over a release cycle, is what separates a QA function that's always catching up from one that's ahead.

Mahoraga V2 is the execution engine that makes that possible. Not perfectly — it's still evolving. But reliably enough to build a QA process on.

Frequently Asked Questions

What is scriptless test automation? Scriptless test automation lets QA teams write and run automated tests without writing code. Instead of Appium scripts or Espresso selectors, testers write plain English instructions. The execution engine interprets those instructions and executes them on a real device. Tests survive UI changes because they're written against behaviour, not implementation details.

What is the difference between Mahoraga V1 and V2? V2 reduces per-step execution time from 40–50 seconds to approximately 7 seconds, improves planning logic for handling ambiguous screen states, and significantly reduces the hallucination rate from V1. The episodic memory layer — which keeps full session context across every step in a flow — was also meaningfully strengthened in V2, making long and conditional flows far more reliable.

How does Mahoraga keep track of context across a long test flow? Mahoraga carries full session context throughout execution — every prior action, every prior screen state, and the original instruction are all available at each decision point. This episodic memory means it doesn't lose its place when a loading screen appears, a redirect happens, or a permission dialog interrupts mid-flow. It knows what it was trying to accomplish and resumes accordingly.

How does intent-based testing differ from Appium? Appium tests are written against element selectors — XPath, accessibility IDs, class hierarchy. They break when UI elements change their attributes. Intent-based tests describe what should happen without specifying selectors. The execution engine finds the right elements based on screen state. When the UI changes, the intent stays valid.

Is Mahoraga fully deterministic across test runs? V2 is significantly more consistent than V1, but full re-run determinism — particularly on flows with network dependencies or third-party auth — is still evolving. Retry logic and flakiness detection handle most remaining variance. This is a stated priority for the next development cycle.