Merged
Size
S
Change Breakdown
Feature55%
Bug Fix35%
Maintenance10%
#28129feat(ai-builder): Add --keep-workflows flag and fix eval execution errors (no-changelog)

AI builder eval gains workflow debugging support

AI builder eval gains workflow debugging support

A new --keep-workflows flag preserves built workflows after evaluation runs, making it easier to debug AI builder behavior. The change also fixes n8n API compliance and improves error handling.

When evaluating AI-built workflows, the old system deleted everything after each run — developers had no way to inspect what the AI actually generated. A new --keep-workflows flag lets developers preserve workflows for post-mortem debugging without manually saving them.

The changes also address an API compliance issue: n8n now requires workflows to be archived before deletion. The client library now calls the archive endpoint before hard-deleting, preventing errors in cleanup routines.

Error handling in EvalExecutionService was also improved — execution failures now return proper evaluation results instead of propagating 500 errors.

In the AI builder package (packages/@n8n/instance-ai), the CLI flag flows through the argument parser into the test harness runner. The n8n-client handles the archive-before-delete sequence, and scenario hints now pass through to pin data generation so mock nodes reflect the actual evaluation context.

View Original GitHub Description

Summary

  • Add --keep-workflows CLI flag to preserve built workflows after evaluation for debugging
  • Fix workflow cleanup: n8n now requires archiving before deletion — deleteWorkflow archives first
  • Catch workflow execution errors in EvalExecutionService and return proper eval results instead of 500s
  • Pass scenario hints to bypass pin data generation so AI/LangChain node mocks reflect the scenario
  • Add Slack channel IDs to daily-slack-summary test case prompt for better builder node configuration

Related Linear ticket

https://linear.app/n8n/issue/TRUST-32

Test plan

  • pnpm typecheck — both cli and instance-ai clean
  • pnpm lint — both packages clean
  • CLI tests: 261 passed (13 suites)
  • eval-mock-helpers tests: 20 passed
  • Manual workflow eval run: build + scenarios + archive + delete working end-to-end

🤖 Generated with Claude Code

© 2026 · via Gitpulse