The Authentication Challenge
When integrating with websites without APIs, we encounter a wide spectrum of authentication mechanisms:
- Traditional cookie-based sessions with CSRF tokens
- JWT-based authentication
- Multi-step authentication flows with intermediate states
- Custom security headers and device fingerprinting
- Dynamic challenge-response systems
- Multi-factor authentication flows
Rather than exposing this complexity to developers, we needed to build an abstraction layer that could:
- Automatically detect the authentication scheme in use
- Handle the necessary protocol-specific handshakes and token management
- Present a unified interface for maintaining authenticated sessions
Protocol Detection and Adaptation
The core of our solution is a protocol detection engine that analyzes authentication flows in real-time. Here's a simplified version of how we represent different authentication schemes:
1interface AuthProtocol {
2 detect(response: Response): Promise<boolean>;
3 handleAuth(credentials: Credentials): Promise<Session>;
4 refreshSession(session: Session): Promise<Session>;
5}
6
7class CookieAuthProtocol implements AuthProtocol {
8 async detect(response: Response) {
9 const headers = response.headers;
10 return headers.has('set-cookie') &&
11 !headers.has('authorization');
12 }
13
14 async handleAuth(credentials: Credentials) {
15 // Handle cookie-based auth flow
16 }
17}
18
19class JWTAuthProtocol implements AuthProtocol {
20 async detect(response: Response) {
21 return response.headers.has('authorization') &&
22 response.headers.get('authorization').startsWith('Bearer');
23 }
24}
The protocol detection engine maintains a registry of protocol handlers and dynamically matches responses against them to determine the appropriate authentication flow.
Session Management and Token Refresh
One key innovation in our approach is how we handle session management across different protocols. Rather than exposing raw tokens or cookies to developers, we abstract them behind an opaque session interface:
1interface Session {
2 id: string;
3 protocol: string;
4 expires?: Date;
5 metadata: Record<string, any>;
6}
7
8class SessionManager {
9 private async refreshIfNeeded(session: Session): Promise<Session> {
10 if (!this.needsRefresh(session)) {
11 return session;
12 }
13
14 const protocol = this.protocolRegistry.get(session.protocol);
15 return protocol.refreshSession(session);
16 }
17}
This allows us to handle protocol-specific session refresh logic transparently, while presenting developers with a consistent interface for managing authentication state.
Handling Complex Authentication Flows
Many modern websites implement multi-step authentication flows, including challenges like CAPTCHA or 2FA. We handle this using a state machine approach:
1interface AuthState {
2 type: 'INITIAL' | 'PENDING_2FA' | 'PENDING_CAPTCHA' | 'AUTHENTICATED';
3 session?: Session;
4 nextAction?: AuthAction;
5}
6
7class AuthenticationFlow {
8 private async transition(
9 state: AuthState,
10 action: AuthAction
11 ): Promise<AuthState> {
12 switch (state.type) {
13 case 'PENDING_2FA':
14 return this.handle2FAChallenge(state, action);
15 case 'PENDING_CAPTCHA':
16 return this.handleCaptchaChallenge(state, action);
17 // Handle other states
18 }
19 }
20}
This state machine approach allows us to represent complex authentication flows while still presenting a simple interface to developers.
Results and Lessons Learned
This abstraction layer has allowed us to support authentication across thousands of websites while maintaining a simple developer interface. Some key learnings:
- Protocol detection needs to be fuzzy rather than exact - websites often implement standard protocols with slight variations
- Session refresh logic is often more complex than the initial authentication
- State machine-based handling of multi-step flows provides the flexibility needed for complex authentication scenarios
The end result is that developers can integrate with any supported website using a consistent interface:
1const session = await anon.authenticate({
2 site: 'example.com',
3 credentials: {
4 username: 'user',
5 password: 'pass'
6 }
7});
8
9// Use session for subsequent requests
10const data = await anon.request({
11 session,
12 url: 'https://example.com/api/data'
13});
While there's still work to be done in handling edge cases and new authentication schemes, this architecture has proven robust enough to handle authentication across a wide range of websites while maintaining a developer-friendly interface.