Unicode filenames preserved in wiki slugs
Wiki page titles with Chinese, Japanese, Cyrillic, or Arabic characters now generate correct filenames instead of collapsing into "page.md" fallbacks.
The Memory Wiki plugin was stripping every non-ASCII character from page titles when generating filenames. A title like "大语言模型概述" would silently become "page.md"—meaning all Chinese titles mapped to the same file, with later pages overwriting earlier ones. The regex /[^a-z0-9]+/g accepted only basic Latin letters and digits.
Two functions received Unicode-aware replacements: slugifyWikiSegment() now uses /[^\p{L}\p{N}\p{M}]+/gu, preserving letters from any writing system plus digits and combining marks. The same fix landed in normalizeClaimTextKey() to prevent CJK contradiction notes from collapsing into identical empty-string keys.
To keep filenames within filesystem limits, a new truncation system adds SHA-1 hash suffixes when segments exceed byte thresholds. This handles edge cases where Unicode characters expand byte length beyond platform limits.
The fix lives in the memory-wiki extension, used by bridge and unsafe-local page resolution modules. Changelog entry credits @zhouhe-xydt and @vincentkoc.
View Original GitHub Description
Summary
Replace ASCII-only regex with Unicode-aware regex (/[^\p{L}\p{N}]+/gu) in two functions to preserve CJK, Cyrillic, Arabic, and other non-ASCII characters in wiki slugs:
slugifyWikiSegment()inextensions/memory-wiki/src/markdown.tsnormalizeTextKey()inextensions/memory-wiki/src/claim-health.ts
Fixes
Fixes #64620
Test plan
-
pnpm checkpasses - Manual verification of Unicode slug generation
Examples
| Title | Before | After |
|---|---|---|
| 大语言模型概述 | syntheses/page.md | syntheses/大语言模型概述.md |
| LLM 架构分析 | syntheses/llm.md | syntheses/llm-架构分析.md |
| Circuit Breaker 自動恢複 | syntheses/circuit-breaker.md | syntheses/circuit-breaker-自動恢複.md |