Merged
Size
XL
Change Breakdown
Feature55%
Security35%
Dependencies5%
Refactor5%
#28251feat(core): Add parse-file tool for structured attachments (no-changelog)

Structured file attachments now parseable via secure API

Structured file attachments now parseable via secure API

Instance AI agents can now inspect CSV, TSV, and JSON files attached to messages — column metadata, type inference, and paginated row access come standard, with guardrails against size abuse and prompt injection.

When users attach structured data files to messages, those files previously ended up as raw bytes injected into the model prompt — an approach that offered no visibility into the data's structure and carried inherent size risks. A new parse-file tool now exposes attached CSV, TSV, and JSON files through a dedicated, paginated API that agents can call directly.

The tool returns column metadata including normalized names and inferred types, along with paginated row data. Agents follow a defined workflow: preview the first 20 rows, create a data table with the sanitized schema, then page through remaining rows in batches of up to 100. A hard limit of 10 parse calls per file prevents runaway loops, and the agent reports how many rows were imported versus how many remain.

Security guardrails run throughout the implementation. Decoded file size is capped at 512KB, columns are limited to 50, cells to 5000 characters each, and total cell budgets are enforced. Dangerous keys like __proto__, constructor, and prototype are rejected outright to prevent prototype pollution attacks. Cell values starting with formula characters (=, +, @, -) trigger warnings in the output, alerting agents to potential spreadsheet injection risks.

Structured attachments no longer appear as raw multimodal content in prompts. Instead, they are replaced with a compact manifest listing file names, types, and sizes. Only non-structured attachments retain the original file path for multimodal injection. For messages containing only attachments with no text, a stub message is synthesized directing the agent to inspect the first parseable file.

In the n8n monorepo, this touches the @n8n/instance-ai package where the parser and tool live, the CLI layer where attachment routing is handled, and the data-table agent workflow where the new parse-file tool is now registered.

View Original GitHub Description

Summary

Add a native parse-file tool to Instance AI that exposes structured file attachments (CSV, TSV, JSON) through a secure, paginated API instead of injecting raw file bytes into the model prompt. <img width="828" height="400" alt="Screenshot 2026-04-10 at 13 12 47" src="https://github.com/user-attachments/assets/7982d1ec-23a8-49f4-bce2-a65ff39c773f" />

Key changes:

  • Attachment routing: Structured attachments (csv/tsv/json) are replaced with a compact manifest in the prompt text; non-structured attachments keep the existing multimodal file path

  • parse-file tool: Thin wrapper over a parser utility with format detection, column normalization, type inference, pagination, and output budgeting

  • Attachment-only messages: message may now be empty when attachments is non-empty — synthesizes a stub directing the agent to inspect the first parseable file

  • Data-table agent: parse-file added to tool subset, max steps increased 15 → 35, prompt updated with import flow (preview → create table → paginate + insert)

  • Security guardrails: 512 KB decoded-size cap, 50 column max, 2000 cell budget, 40000 char budget, 5000 char cell limit, dangerous key rejection (__proto__, constructor, prototype)

  • Trace redaction: Raw structured attachment data is excluded from prompt-build trace outputs

  • I have seen this code, I have run this code, and I take responsibility for this code.

© 2026 · via Gitpulse