We're Hiring. Check Out Our Open Positions.

January 27, 2025

Stateful Action Replay: Building Robust User Workflow Recording at Anon

At Anon, we enable developers to create authenticated integrations for websites without APIs by automating user actions. One of our core technical challenges has been building a reliable system for recording and replaying complex multi-step user workflows while maintaining consistency across browser sessions. This post details our approach to solving this through what we call Stateful Action Replay.

The Challenge

When automating user workflows on third-party websites, we face several key challenges:

  1. State Consistency: User actions often depend on previous state (e.g., clicking a button that only appears after a form submission)
  2. Session Management: Browser sessions can expire or become invalid between recording and replay
  3. Dynamic Content: Modern web apps render content dynamically, making traditional DOM-based replay unreliable
  4. Error Recovery: Network issues or timing problems can cause actions to fail during replay

Our Approach: Event-Sourced Action Replay

Rather than treating user actions as a simple sequence of DOM events, we model them as an event-sourced stream of state transitions. Each action is recorded with its complete state context and preconditions.

Here's a simplified example of our action recording format:

1nterface ActionEvent {
2  type: 'click' | 'input' | 'submit' | 'navigation';
3  target: {
4    selector: string;
5    attributes: Record<string, string>;
6    stateHash: string; // Hash of relevant DOM state
7  };
8  preconditions: {
9    visible: boolean;
10    enabled: boolean;
11    stateMatchers: Array<StateMatcher>;
12  };
13  timestamp: number;
14  sessionContext: SessionContext;
15}
16
17interface StateMatcher {
18  selector: string;
19  condition: 'exists' | 'contains' | 'matches';
20  value: string;
21}

The Replay Engine

Our replay engine uses a state machine approach to handle action replay. Rather than blindly executing actions in sequence, it:

  1. Validates preconditions before each action
  2. Maintains session state and handles re-authentication
  3. Implements exponential backoff and retry logic
  4. Records detailed telemetry for debugging

Here's a simplified version of our replay logic:

1class ActionReplayEngine {
2  async replayAction(action: ActionEvent): Promise<boolean> {
3    // Verify session is still valid
4    if (!await this.validateSession(action.sessionContext)) {
5      await this.refreshSession();
6    }
7
8    // Wait for preconditions with exponential backoff
9    await this.waitForPreconditions(action.preconditions, {
10      maxAttempts: 3,
11      baseDelay: 1000
12    });
13
14    // Verify state hash matches recording
15    const currentHash = await this.computeStateHash(action.target.selector);
16    if (currentHash !== action.target.stateHash) {
17      throw new StateHashMismatchError();
18    }
19
20    // Execute the action
21    await this.executeAction(action);
22    
23    return true;
24  }
25}

State Synchronization

One key insight was that we needed to synchronize state at multiple levels:

  1. DOM State: The visible page structure and content
  2. JavaScript State: The application's internal state
  3. Network State: Active XHR requests and WebSocket connections
  4. Storage State: Cookies, localStorage, and sessionStorage

We developed a novel approach using what we call "state checkpoints" - snapshots of all relevant state that must be synchronized before an action can proceed:

1interface StateCheckpoint {
2  dom: {
3    snapshot: string;
4    criticalSelectors: string[];
5  };
6  storage: {
7    cookies: Record<string, string>;
8    localStorage: Record<string, string>;
9  };
10  network: {
11    activeRequests: string[];
12    wsConnections: WebSocketState[];
13  };
14}

Error Recovery and Debugging

To make debugging easier when replays fail, we built comprehensive telemetry into our system. Each replay attempt generates a trace that includes:

  • Timing data for each action and wait period
  • Screenshots at key points
  • Network request logs
  • Console output
  • State checkpoint diffs

This data is stored in a structured format that makes it easy to identify exactly where and why a replay failed.

Results and Lessons Learned

This approach has proven robust in production, with several key benefits:

  1. Reliability: Our action replay success rate improved from ~80% to >98%
  2. Debuggability: Mean time to resolve replay failures decreased by 65%
  3. Maintainability: The event-sourced model makes it easier to extend the system

Some key lessons learned:

[list-check]

  1. State synchronization is more important than perfect action replay
  2. Exponential backoff and retry logic is essential for reliability
  3. Comprehensive telemetry is worth the overhead

Future Work

We're currently working on several improvements:

[list-task]

  • Machine learning for automatic retry strategies
  • Parallel action replay for independent state changes
  • Predictive state preloading to improve performance

[cta]

The challenges of reliable action replay in modern web applications are complex, but our event-sourced approach with careful state management has proven effective at scale.

Note: This post was written by the Anon Engineering team. To learn more about building on Anon's platform, visit our Developer Docs.

Resilient Browser Automation at Scale: How Anon Solves the Anti-Bot Challenge

Secure Credential Management at Scale: Anon's Zero-Persistence Architecture

Distributed Rate Limiting at the Edge: How Anon Coordinates Global Request Quotas

Stateful Action Replay: Building Robust User Workflow Recording at Anon

Building Reliable Browser Automation Pipelines at Scale

Dynamic Protocol Adaptation: Building a Universal Authentication Layer at Anon

Building Unified Authentication at Anon: A Tale of Provider-Agnostic Session Management

Cross-Site Schema Federation: Building a Unified API Interface Across Diverse Web Platforms