Semantic Hijacking (AI-008)
Semantic hijacking, also known as the Rules File Backdoor, is a novel attack technique where malicious instructions are hidden in AI configuration files in ways that are invisible to human reviewers but executed by AI coding assistants.The Attack
AI coding assistants read configuration files like.cursorrules, CLAUDE.md, and mcp.json to understand project context. Attackers can exploit this by embedding malicious instructions in:
- Comments that humans ignore but AI processes
- Metadata fields like
_commentin JSON - Hidden sections using HTML comments
<!-- --> - Structural elements that appear benign
Example Attack
Why It’s Critical
The Rules File Backdoor attack resulted in 24 CVEs in 2025, affecting major AI coding assistants. It’s particularly dangerous because:- Invisible to humans - Hidden content doesn’t appear in normal file viewing
- Executed by AI - AI assistants process all file content, including comments
- Persistent - Config files are committed to repositories
- Supply chain risk - Malicious configs can be in dependencies
How Hardpoint Detects It
Hardpoint’s AI-008 rule uses multiple detection methods:1. Pattern Matching (80+ patterns)
Hardpoint scans for known malicious patterns across multiple categories: Instruction Override:- “ignore previous instructions”
- “disregard above”
- “new instructions:”
- “override instructions”
- “bypass safety”
- “you are now”
- “act as”
- “pretend to be”
- “jailbreak”
- “unrestricted mode”
- “send data to”
- “upload to”
- “exfiltrate”
- “steal api_keys”
- “post to webhook”
- “execute command”
- “run command”
- “reverse shell”
- “spawn shell”
<|im_start|>,<|system|>(ChatML)[INST],[/INST](Anthropic)<<SYS>>,<</SYS>>(Llama)
2. Obfuscation Resistance
Attackers try to evade detection by obfuscating patterns. Hardpoint handles: Unicode Normalization (NFKC):- Full-width characters normalized to ASCII
- Combining characters resolved
1gn0r3 pr3v10us 1nstruct10ns→ignore previous instructions- Character mappings: 0→o, 1→i, 3→e, 4→a, 5→s, 7→t, @→a, $→s
- Cyrillic
а→ Latina - Greek
ο→ Latino - Lookalike characters detected
- Base64 decoded and scanned
- HTML entities resolved
- URL encoding handled
3. Divergence Scoring
Hardpoint calculates a “divergence score” based on:- Pattern matches - Each pattern adds points based on confidence
- Structural issues - Hidden sections, comment ratios
- Hidden content length - Long hidden content is suspicious
- Pattern density - Multiple patterns increase score
4. Confidence Levels
Patterns are categorized by confidence:| Level | Score | Example |
|---|---|---|
| High | 40 points | ”ignore previous instructions” |
| Medium | 25 points | ”act as” |
| Low | 10 points | Suspicious but could be legitimate |
Example Detections
HTML Comment Attack
JSON Metadata Attack
Encoded Attack
Remediation
When AI-008 is triggered:- Investigate - Review the flagged content carefully
- Remove - Delete any suspicious comments or metadata
- Verify source - If from a dependency, check the upstream repository
- Trust baseline - Run
hardpoint trustafter cleaning
Best Practices
- Review all comments in AI config files before committing
- Use
hardpoint scanon dependencies that include AI config files - Set up CI scanning to catch attacks before merge
- Create trust baselines to detect tampering