Unicode filenames preserved in wiki slugs

zhouhe-xydt

·Apr 12, 2026·#64742fix(memory-wiki): support Unicode characters in slugifyWikiSegment

Wiki page titles with Chinese, Japanese, Cyrillic, or Arabic characters now generate correct filenames instead of collapsing into "page.md" fallbacks.

The Memory Wiki plugin was stripping every non-ASCII character from page titles when generating filenames. A title like "大语言模型概述" would silently become "page.md"—meaning all Chinese titles mapped to the same file, with later pages overwriting earlier ones. The regex /[^a-z0-9]+/g accepted only basic Latin letters and digits.

Two functions received Unicode-aware replacements: slugifyWikiSegment() now uses /[^\p{L}\p{N}\p{M}]+/gu, preserving letters from any writing system plus digits and combining marks. The same fix landed in normalizeClaimTextKey() to prevent CJK contradiction notes from collapsing into identical empty-string keys.

To keep filenames within filesystem limits, a new truncation system adds SHA-1 hash suffixes when segments exceed byte thresholds. This handles edge cases where Unicode characters expand byte length beyond platform limits.

The fix lives in the memory-wiki extension, used by bridge and unsafe-local page resolution modules. Changelog entry credits @zhouhe-xydt and @vincentkoc.

View Original GitHub Description

Summary

Replace ASCII-only regex with Unicode-aware regex (/[^\p{L}\p{N}]+/gu) in two functions to preserve CJK, Cyrillic, Arabic, and other non-ASCII characters in wiki slugs:

slugifyWikiSegment() in extensions/memory-wiki/src/markdown.ts
normalizeTextKey() in extensions/memory-wiki/src/claim-health.ts

Fixes

Fixes #64620

Test plan

pnpm check passes
Manual verification of Unicode slug generation

Examples

Title	Before	After
大语言模型概述	`syntheses/page.md`	`syntheses/大语言模型概述.md`
LLM 架构分析	`syntheses/llm.md`	`syntheses/llm-架构分析.md`
Circuit Breaker 自動恢複	`syntheses/circuit-breaker.md`	`syntheses/circuit-breaker-自動恢複.md`