Workflow evaluation framework runs AI-built automations with mock HTTP responses

JoseBra

·Apr 7, 2026·#27818feat(ai-builder): Workflow evaluation framework with LLM mock execution

A new testing framework lets developers validate n8n workflows built by Instance AI without needing real API credentials — all HTTP calls are intercepted and answered by an LLM using Context7 API documentation.

Testing whether an AI-built workflow actually works has always required real credentials and live API connections — a friction-heavy process that slows iteration. A new evaluation framework removes this dependency entirely by intercepting HTTP requests and generating realistic API responses on-the-fly using an LLM.

The system works in three phases. First, the workflow is analyzed and consistent mock data hints are generated in a single LLM call, ensuring data flows logically through the entire workflow. Second, the workflow executes normally while every HTTP request is captured before it leaves the process — an LLM generates a contextually appropriate response using the node's configuration and real API documentation fetched from Context7. Third, an LLM verifier evaluates whether success criteria were met and categorizes failures as builder issues, mock issues, legitimate failures, or verification gaps.

Developers can run test cases against workflows and receive HTML reports showing execution traces, mock responses, and diagnostic conclusions. The framework handles six HTTP interception points covering axios, legacy requests, and OAuth flows. AI root nodes and protocol-based nodes that bypass the HTTP layer receive pin data instead, generated consistently with the mock data plan.

This framework supports 8 test cases with 27 scenarios spanning webhook routing, contact forms, Linear-Slack reporting, weather monitoring, and more. The contact-form-automation test case passes reliably at 5 out of 5 runs.

View Original GitHub Description

Summary

Adds a complete evaluation framework for testing workflows built by Instance AI. Workflows are executed with LLM-generated mock HTTP responses — no real credentials or API connections needed.

Phase 1: Analyzes the workflow and generates consistent mock data hints (1 Sonnet call per scenario)
Phase 2: Executes the workflow with all HTTP requests intercepted. Each request goes to an LLM that generates a realistic API response using node configuration and API docs from Context7
Phase 3: An LLM verifier evaluates whether success criteria were met and categorizes failures as builder_issue, mock_issue, legitimate_failure, or verification_gap

Key components

6 HTTP interception points in request-helper-functions.ts covering all n8n request helpers (axios, legacy, OAuth1, OAuth2)
Mock credential generation for OAuth flows (eval-mock-helpers.ts)
8 workflow test cases with 27 scenarios across different node types
HTML report with execution traces, mock responses, connections JSON, and failure diagnosis
Verification prompt with rules for accurate failure attribution (structure vs execution, chronological ordering, connection verification)
Context7 integration for API-accurate mock response shapes

Current results

contact-form-automation: 5/5 stable across runs
notification-router: 3/3 when agent uses Switch node (IF node conditions.options bug causes failures ~40% of runs)
Other test cases: vary based on builder non-determinism

https://linear.app/n8n/issue/AI-2298
Depends on Instance AI module (target: feature/instance-ai)

Test plan

pnpm build passes
pnpm typecheck passes in @n8n/instance-ai, packages/cli, packages/core
Run dotenvx run -f .env.local -- pnpm eval:instance-ai workflows --verbose from packages/@n8n/instance-ai/
Verify contact-form-automation passes 5/5
Verify HTML report generates at .data/workflow-eval-*.html

🤖 Generated with Claude Code

Workflow evaluation framework runs AI-built automations with mock HTTP responses

Summary

Key components

Current results

Related

Test plan