Abstract
This paper adopts a “First Principles” perspective to systematically deconstruct the complete information chain of AI systems—from physical power input to semantic Token output—identifying security risks across seven key links. The core innovation of this research lies in the introduction of the “Information-Parasitic Entropy System” framework previously proposed by the author. This framework categorizes risk relationships within each link into three ecological types: parasitism, symbiosis, and epiphytism. It utilizes mathematical tools such as Shannon Information Entropy ( ) and Semantic Entropy to quantify threat levels. By cross-referencing the OWASP Top 10 for LLM Applications 2025, recent IEEE/ACM cybersecurity research, and the Bayesian Predictive Coding framework, this paper proposes defense measures that are both academically rigorous and engineery-operable. The research concludes that AI security is essentially an entropy game: while attackers continuously inject semantic entropy, the defender’s task is to deploy “Entropy Budget Gates” (SEBG) across the entire system chain.
本文以「第一性原理」(First Principles)視角,系統性拆解 AI 系統從物理電能輸入至語意 Token 輸出的完整資訊鏈路,並在此鏈路的七個關鍵環節識別資安風險。研究核心創新在於:引入先前提出的「資訊寄生熵系統」(Information-Parasitic Entropy System)理論框架,以寄生(parasitism)、互利共生(symbiosis)與附生(epiphytism)三種生態型態分類各環節的風險關係,並援引 Shannon 資訊熵 H = −ΣpᵢlogPᵢ 與語意熵(Semantic Entropy)的數學工具量化威脅程度。本文對照 OWASP Top 10 for LLM Applications 2025、近期 IEEE/ACM 資安研究成果及 Bayesian Predictive Coding 框架,提出具有學術嚴謹性與工程可操作性的防禦措施。研究指出:AI 資安的本質是一場熵博弈——攻擊者持續注入語義熵,而防禦者的任務是在系統全鏈路部署「熵預算閘門」(Entropy Budget Gate)。
Keywords: AI Security, Information-Parasitic Entropy, Token Ecology, LLM Security, Semantic Entropy, OWASP LLM Top 10 2025, Adversarial Attacks, Supply Chain Attacks
I. Introduction: Why a First Principles Perspective is Needed
Current AI security research primarily focuses on attacking types (e.g., prompt injection, data poisoning, model inversion), lacking a unified, coherent physical-information-semantic framework. This paper argues that only by starting from the most fundamental physical energy input and following the complete information flow—bit \rightarrow data \rightarrow Token \rightarrow model parameters \rightarrow inference output—can security blind spots be comprehensively identified.
This perspective echoes the spirit of “first-principles calculation” in physics: deriving system behavior from fundamental axioms rather than relying on empirical rules. In a cybersecurity context, this means we must ask: At every step of information transformation from the real world into LLM output, where is the attacker’s point of intervention? What is the mechanism of entropy increase? What is the information-theoretic basis for defense?
This framework also aligns with the author’s previously constructed “Token Ecology” theory, which uses an ecosystem metaphor to describe information flow in AI language models, classifying token-corpus interactions as parasitic, symbiotic, or epiphytic, and positioning humans as nodes within the ecosystem rather than external operators.
II. Information-Parasitic Entropy System: Review of Theoretical Foundations
2.1 Definition of Information Parasitism
Information Parasitism refers to an information entity (the parasite) obtaining asymmetric benefits by embedding itself into a host information system (LLM, training database, inference pipeline) while simultaneously weakening the host’s semantic integrity. Analogous to biological parasitism, the parasite does not need to completely destroy the host to continuously extract value—explaining why many AI attacks are characterized by long-term latency (e.g., persistent backdoors, sleeper agents).
2.2 Semantic Entropy as a Threat Quantification Tool
Shannon Information Entropy ( ) quantifies system uncertainty. In the LLM context, researchers have expanded from simple token-level entropy to Semantic Entropy: taking the entropy of the probability distribution of equivalence classes (under mutual entailment) generated by the model to quantify uncertainty at the semantic level, rather than just surface-form uncertainty (Kossen et al., 2024). Furthermore, Kernel Language Entropy (KLE)utilizes von Neumann entropy to measure continuous semantic dependencies between outputs (Nikitin et al., 2024).
The “Information-Parasitic Entropy System” proposed in this paper asserts that an attacker is essentially an external source of entropy injection. Across the seven links of the system, attackers can increase the system’s semantic entropy in different ways, reducing its “predictability” in semantic space to control or destroy the output. The defender’s task is to establish “Semantic Entropy Budget Gates” (SEBG) at each link to detect and block abnormal entropy-increasing events.
2.3 Three Parasitic Modes and Cybersecurity Correspondence
| Mode | Description | Example |
|---|---|---|
| Parasitic | Attackers unilaterally extract benefits from the AI system, causing damage to the host. | Training data poisoning, model inversion for training data extraction. |
| Epiphytic | Attackers attach to the system surface, not directly destroying the core but continuously siphoning information. | Side-channel attacks, membership inference attacks during inference. |
| Pseudo-symbiotic | Attackers disguise themselves as legitimate service providers, appearing to offer beneficial resources while planting malicious behaviors. | Malicious third-party plugins, contaminated LoRA fine-tuned models. |
III. Seven-Link Full-Chain Analysis of Cybersecurity Threats
Link 1: Power-Driven Sensors — Image and Voice Entering the Bit World
- Technical Features: The physical starting point of an AI system is the sensor: cameras, microphones, radar, LiDAR. Power drives signal acquisition; analog signals are quantized by ADCs (Analog-to-Digital Converters) into discrete bitstreams. This “analog-to-digital” conversion is inherently an information-incomplete process—quantization errors, sampling rate limits, and sensor noise constitute the system’s native uncertainty baseline.
- Parasitic Mode and Entropy Mechanism: The essence of an Adversarial Attack is injecting carefully designed noise below the human perception threshold into the input signal, causing the DNN model to produce high-confidence misclassifications. Research shows that sensors in edge devices are often the preferred entry point for attacks because defense resources are weakest there; sensor data is contaminated before reaching the central computing end, rendering backend robustness mechanisms ineffective.
- Attack Methods: FGSM (Fast Gradient Sign Method) and PGD (Projected Gradient Descent). Experiments on the IoT-23 dataset show that CNNs are most vulnerable to adversarial attacks, while Decision Trees are relatively robust—reflecting a positive correlation between model complexity and adversarial vulnerability.
- Parasitic Mode: Epiphytic—Attackers attach to the sensor signal boundary, injecting semantic perturbations without destroying the hardware.
- Defense Strategy (SEBG-1):
- EdgeShield Architecture: Offload adversarial noise detection tasks from central computing to edge devices to filter contaminated data at the source.
- Sensor Data Integrity Signatures: Add encrypted timestamps and hash chains to sensor outputs to detect tampering during transmission.
- Multi-modal Cross-validation: Cross-verify outputs from image and voice sensors to maintain robustness if a single sensor is compromised.
- Federated Learning Isolated Training: Train detectors locally at edge nodes to avoid uploading raw sensor data, reducing the risk of large-scale leaks.
- EdgeShield Architecture: Offload adversarial noise detection tasks from central computing to edge devices to filter contaminated data at the source.
Link 2: Data Processing — From Raw Data to Tokens for the LLM
- Technical Features: Raw sensor data (image pixel matrices, audio waveforms) must undergo feature extraction, preprocessing, and normalization before being converted by a Tokenizer into discrete Token sequences. This involves text cleaning, word segmentation, subword algorithms (BPE/WordPiece/SentencePiece), and Token ID mapping.
- Parasitic Mode and Threats: Tokenization is a semantic compression process with inherent information loss. Attackers exploit this via:
- Homoglyph Attack: Replacing original text with visually similar but different Unicode characters to bypass keyword filters.
- Token Boundary Manipulation: Inserting invisible characters (e.g., zero-width space U+200B) to alter segmentation and influence semantic understanding.
- Code-switching Attack: Mixing characters from different languages to trigger abnormal Tokenizer paths.
- Homoglyph Attack: Replacing original text with visually similar but different Unicode characters to bypass keyword filters.
- Parasitic Mode: Parasitic—Attackers tamper with the original semantics before semantic encoding.
- Defense Strategy (SEBG-2):
- Pre-tokenization Normalization Pipeline: Implement Unicode normalization (NFC/NFKC) and filter invisible/control characters.
- Token Entropy Anomaly Detection: Calculate local entropy distributions of input Token sequences; entropy spikes may indicate injection attacks.
- Tokenization for PII Redaction: Mask Personally Identifiable Information (PII) before it enters the model processing pipeline.
- Pre-tokenization Normalization Pipeline: Implement Unicode normalization (NFC/NFKC) and filter invisible/control characters.
Link 3: LLM Pre-training — The Formation of Model Priors
- Technical Features: The Pre-training phase determines the LLM’s “worldview”—the model learns a language prior through language modeling tasks on hundreds of billions of Tokens. This forms the “high-level prior” in the Bayesian Predictive Coding framework, upon which subsequent Fine-tuning and inference perform Bayesian updates.
- Parasitic Mode and Threats:Training Data Poisoning is the most core and difficult-to-detect threat. Research has established a unified threat model for LLM poisoning across four dimensions: poison set, trigger function, poison behavior, and deployment mode.
- Poison triggers have expanded from lexical to semantic, behavioral, and even system-level, making detection extremely difficult. More dangerous are “Sleeper Agents”—deceptive behaviors planted during training that persist even after Safety Training.
- Supply Chain Attacks are prominent here: covering data sources, base models, fine-tuning methods (LoRA/PEFT), and deployment platforms. A leak at any link spreads downstream. For example, researchers found unprotected API tokens for Meta’s Llama 2 on GitHub and Hugging Face, granting write access to 723 accounts.
- Poison triggers have expanded from lexical to semantic, behavioral, and even system-level, making detection extremely difficult. More dangerous are “Sleeper Agents”—deceptive behaviors planted during training that persist even after Safety Training.
- Parasitic Mode: Pseudo-symbiotic—Malicious training data sources disguise themselves as legitimate corpora providers.
- Defense Strategy (SEBG-3):
- Data Provenance Tracking: Establish immutable records for all training data.
- Statistical Consistency Audits: Detect semantic distribution anomalies; poisoned data often forms identifiable outlier clusters in vector space.
- Differential Privacy Training: Add calibrated noise to gradient updates to limit the influence of any single training sample.
- Model Behavior Benchmarking: Regularly evaluate the model with fixed test suites to detect backdoor triggers.
- Data Provenance Tracking: Establish immutable records for all training data.
Link 4: Context Input Reference — RAG, Prompts, and External Tools
- Technical Features: Modern LLM systems integrate Retrieval-Augmented Generation (RAG), System Prompts, user inputs, and external tool calls (function calling). These act as “runtime prior updates,” equivalent to sensor inputs that instantly correct the model’s predictive distribution.
- Parasitic Mode and Threats:Prompt Injection is the most active vector, ranked #1 in the OWASP LLM Top 10 2025.
- Indirect Prompt Injection: Attackers embed malicious instructions in external files (webpages, databases) cited by RAG.
- System Prompt Leakage (OWASP LLM07:2025): Allows attackers to infer defense configurations to design bypass strategies.
- Vector and Embedding Weakness (OWASP LLM08:2025): Makes the RAG knowledge base a new attack surface where vector searches can be manipulated.
- Indirect Prompt Injection: Attackers embed malicious instructions in external files (webpages, databases) cited by RAG.
- Parasitic Mode: Parasitic—Attackers hijack the model’s execution flow via Prompt pipelines.
- Defense Strategy (SEBG-4):
- Policy Enforcement: Deploy semantic filters at input/output layers to detect instruction patterns deviating from expected distributions.
- Context Isolation: Clearly isolate System Prompts from user inputs in semantic space.
- Least-Privilege Tool Use: Grant only the minimum permissions necessary for specific tasks.
- RAG Document Integrity Verification: Use hash verification for RAG reference documents to detect post-hoc tampering.
- Policy Enforcement: Deploy semantic filters at input/output layers to detect instruction patterns deviating from expected distributions.
Link 5: Generation Data Operation — Inference and Sampling
- Technical Features: LLM generation is an auto-regressive sampling process. At each step, the model outputs a probability distribution over the Vocabulary, then selects the next Token via strategies like Greedy Search, Top-k, Top-p/Nucleus, or Temperature Sampling. Randomness introduces Generation Entropy.
- Parasitic Mode and Threats:
- Inference-time DoS: Using “Sponge Samples” to produce abnormally long output sequences, exhausting GPU resources. “Poisoned DoS (P-DoS)” involves planting samples during fine-tuning to break sequence length limits.
- Hallucination as Systemic Risk: Models produce high-confidence errors in high-semantic-entropy scenarios, classified as OWASP LLM09:2025 “Misinformation”.
- Research shows Attention Head Entropy can predict correctness; specific heads show distinct entropy patterns when generating correct answers.
- Inference-time DoS: Using “Sponge Samples” to produce abnormally long output sequences, exhausting GPU resources. “Poisoned DoS (P-DoS)” involves planting samples during fine-tuning to break sequence length limits.
- Parasitic Mode: Parasitic—Inference DoS parasitizes computing resources; hallucinations are endogenous entropy failures.
- Defense Strategy (SEBG-5):
- Dynamic Sequence Length Limits: Adjust max generation length based on context complexity to defend against DoS.
- Real-time Semantic Entropy Monitoring: Monitor token-level Shannon entropy; spikes trigger output review.
- Reliability Assessment: Use lightweight entropy-based evaluators to predict answer quality before output.
- Watermarking: Embed detectable statistical watermarks in low-entropy Token generation to track AI-generated content.
- Dynamic Sequence Length Limits: Adjust max generation length based on context complexity to defend against DoS.
Link 6: Data Output — API, Interface, and Downstream Systems
- Technical Features: Output is delivered via API endpoints, chat interfaces, or code execution environments. In Agentic AI, outputs can trigger high-impact operations like tool calls or database writes, amplifying risks.
- Parasitic Mode and Threats:
- Sensitive Information Disclosure (OWASP LLM02:2025): LLMs may leak PII or trade secrets from training data via Memorization Extraction Attacks.
- Insecure Output Handling: Unsanitized outputs passed to SQL engines or Shell executors can trigger XSS, SQL Injection, or Command Injection.
- Excessive Agency (OWASP LLM08:2025): LLMs with too much power can be hijacked by Prompt Injection to perform high-risk actions.
- Sensitive Information Disclosure (OWASP LLM02:2025): LLMs may leak PII or trade secrets from training data via Memorization Extraction Attacks.
- Parasitic Mode: Epiphytic—Attackers attach to the information flow via output channels to siphon data or plant actions.
- Defense Strategy (SEBG-6):
- Output Sanitization and Structuring: Implement strict parsing; never pass raw text directly to executable environments.
- Egress Filtering: Deploy semantic classifiers at API gateways to detect PII or confidential patterns.
- Sandbox Environments: Run all AI-generated code in sandboxes with restricted network/system access.
- Rate Limiting: Monitor for abnormal API call patterns (e.g., mass extraction requests).
- Output Sanitization and Structuring: Implement strict parsing; never pass raw text directly to executable environments.
Link 7: Systemic Integration — Cumulative Security Effects
The risks of the first six links are not independent but form an “Entropy Accumulation Effect” through the information chain: adversarial perturbations from the sensor layer can be amplified after Tokenization; backdoors in poisoned training data can be activated by Prompt Injection; output leaks provide intelligence for the next round of Prompt attacks.
The “Expanded Attack Surface” of Agentic AI systems means a single successful Prompt Injection can trigger a cascade of high-impact operations across tools and databases. OWASP LLM03:2025 emphasizes that the entire supply chain, from data to deployment, is a threat surface.
IV. Integration of the Information-Parasitic Entropy Defense Architecture
4.1 The SEBG Full-Chain Framework
Integrating the strategies above, this paper proposes the “Semantic Entropy Budget Gate (SEBG)” framework. Its core idea is to set monitoring and filtering mechanisms based on entropy thresholds at every transformation node of the information flow, ensuring that the semantic entropy increment at each link remains within a defined Security Entropy Budget.
The mathematical intuition is: Let H_{threshold} be the safety threshold for a link. If the actual semantic entropy increment \Delta H > H_{threshold}, an isolation and audit procedure is triggered. H_{threshold} is adjusted dynamically based on the system’s security level (e.g., stricter for medical or financial contexts).
4.2 Mapping Parasitic Modes to Defense Strategies
| Mode | Primary Attack Links | Core SEBG Mechanism |
|---|---|---|
| Parasitic | Links 2, 3, 4 | Data Poisoning, Prompt Injection, Training Backdoors |
| Epiphytic | Links 1, 6 | Adversarial Sensor Attacks, Memorization Extraction, Output Leaks |
| Pseudo-symbiotic | Links 3, 4 | Malicious Supply Chain, RAG Knowledge Poisoning |
4.3 Defensive Implications of Bayesian Predictive Coding
From the perspective of Hierarchical Predictive Processing (HPP), LLM inference is a Bayesian posterior update: high-level priors (formed by training) interact with low-level sensor inputs (Prompt/Context) to minimize prediction error. This reveals:
- Training poisoning is a malicious modification of the “high-level prior,” affecting all subsequent inference.
- Prompt Injection is a malicious override of “low-level sensor inputs,” attempting to bypass prior constraints.
- Defense must ensure both “Prior Integrity” and “Input Trustworthiness”.
V. Conclusion and Research Outlook
This paper adopts a “First Principles” view to systematically identify AI security threats across seven links—from physical power to semantic Tokens—interpreting attack mechanisms and defense strategies through the “Information-Parasitic Entropy System.” Major contributions include:
- Proposing a seven-link SEBG defense framework using semantic entropy as a unified risk quantifier.
- Systematically reclassifying OWASP LLM Top 10 2025 attack vectors into Parasitic, Epiphytic, and Pseudo-symbiotic modes.
- Integrating Bayesian Predictive Coding with AI security analysis to reveal the nature of attack behaviors at the neural computational semantic layer.
- Introducing Shannon Entropy, Semantic Entropy, and KLE into the quantitative analysis toolkit for AI security.
Future research directions include adaptive learning mechanisms for SEBG thresholds, entropy accumulation modeling for multi-agent systems, and information-theory-based AI security compliance frameworks (aligned with EU AI Act risk requirements).
Information parasitism is a new threat paradigm for the AI era. Understanding its entropy dynamics is the theoretical cornerstone for building the next generation of AI security defense systems.
摘要 Abstract
本文以「第一性原理」(First Principles)視角,系統性拆解 AI 系統從物理電能輸入至語意 Token 輸出的完整資訊鏈路,並在此鏈路的七個關鍵環節識別資安風險。研究核心創新在於:引入先前提出的「資訊寄生熵系統」(Information-Parasitic Entropy System)理論框架,以寄生(parasitism)、互利共生(symbiosis)與附生(epiphytism)三種生態型態分類各環節的風險關係,並援引 Shannon 資訊熵 H = −ΣpᵢlogPᵢ 與語意熵(Semantic Entropy)的數學工具量化威脅程度。本文對照 OWASP Top 10 for LLM Applications 2025、近期 IEEE/ACM 資安研究成果及 Bayesian Predictive Coding 框架,提出具有學術嚴謹性與工程可操作性的防禦措施。研究指出:AI 資安的本質是一場熵博弈——攻擊者持續注入語義熵,而防禦者的任務是在系統全鏈路部署「熵預算閘門」(Entropy Budget Gate)。
關鍵詞: AI 資安、資訊寄生熵、Token 生態學、LLM 資安、語意熵、OWASP LLM Top 10 2025、對抗性攻擊、供應鏈攻擊
一、緒論:為何需要第一性原理視角
當前 AI 資安研究多以攻擊類型分類(prompt injection、data poisoning、model inversion 等)為主軸,缺乏一個統一的物理-資訊-語意連貫框架。本文主張,唯有從最基礎的物理能量輸入出發,沿著比特(bit)→ 資料(data)→ Token → 模型參數 → 推論輸出的完整信息流,才能全面識別資安的盲點。
這個視角呼應物理學中的「第一性原理計算」(first-principles calculation)精神:不依賴經驗規則,而是從基本公理推導系統行為。在資安語境中,這意味著我們必須問:在信息從現實世界轉化為 LLM 輸出的每一步轉換中,攻擊者的干預點在哪裡?熵增加的機制是什麼?防禦的資訊理論基礎是什麼?
此框架亦與作者先前建構的「Token 生態學」(Token Ecology)理論相銜接,後者以生態系統隱喻描述 AI 語言模型中的信息流動,將 token-corpus 互動分類為寄生(parasitic)、互利共生(symbiotic)、或附生(epiphytic)關係,並將人類定位為生態系統中的節點而非外部操作者。
二、資訊寄生熵系統:理論基礎回顧
2.1 寄生資訊的定義
資訊寄生(Information Parasitism)指:某一信息實體(寄生方)透過嵌入宿主信息系統(LLM、訓練資料庫、推論管道),獲取不對稱利益,同時削弱宿主的語義完整性。與生物寄生類比,寄生方無需完全破壞宿主即可持續擷取價值——這解釋了為何許多 AI 攻擊具有長期潛伏性(persistent backdoors、sleeper agents)。
2.2 語意熵作為威脅量化工具
Shannon 資訊熵 H = −ΣpᵢlogPᵢ 量化系統的不確定性。在 LLM 語境中,研究者已從單純的 token 層級熵擴展至語意熵(Semantic Entropy):對模型生成的等義類(equivalence classes under mutual entailment)的機率分佈取熵,以量化語義層面的不確定性,而非僅表層形式的不確定性(Kossen et al., 2024)。更進一步,Kernel Language Entropy(KLE)採用馮諾依曼熵(von Neumann entropy)度量輸出間的連續語義依賴關係(Nikitin et al., 2024)。
本文的「資訊寄生熵系統」主張:攻擊者的本質是一個外部熵注入源。在系統的七個環節,攻擊者可以用不同方式增加系統的語意熵,使其在語義空間中的「可預測性」降低,進而控制或破壞輸出。防禦的任務則是在各環節建立「語義熵預算閘門(Semantic Entropy Budget Gate,SEBG)」,以偵測並阻斷異常熵增事件。
2.3 三種寄生型態與資安對應
- 寄生型(Parasitic): 攻擊者從 AI 系統單向擷取利益,對宿主系統造成損傷。例:訓練資料投毒、模型逆向萃取訓練資料。
- 附生型(Epiphytic): 攻擊者附著於系統表面,不直接破壞核心,但持續汲取信息。例:側通道攻擊(side-channel attack)、推論時的記憶萃取(membership inference attack)。
- 偽共生型(Pseudo-symbiotic): 攻擊者偽裝為合法服務提供者,在表面上提供有益資源,實則植入惡意行為。例:惡意第三方插件、受汙染的 LoRA 微調模型。
三、七環節資安威脅全鏈路分析
以下按信息流順序逐一分析七個環節的技術特徵、寄生模式、熵增機制及防禦策略。
環節一:電力驅動的感測器——影像與語音進入比特世界
3.1.1 技術特徵
AI 系統的物理起點是感測器:攝影機、麥克風、雷達、LiDAR。電能(Power)驅動信號採集,模擬信號經 ADC(類比數位轉換器)量化後成為離散比特流。這一「從類比到數位」的轉換本身就是一個資訊不完備的過程——量化誤差、取樣率限制、感測器雜訊構成了系統的原生不確定性基底。
3.1.2 寄生型態與熵增機制
對抗性攻擊(Adversarial Attack)的本質是在人類感知閾值以下,向輸入信號注入精心設計的噪聲,使 DNN 模型產生高置信度的錯誤分類。研究顯示,邊緣設備(edge devices)的感測器端極易成為攻擊的首選入口,因為此處防禦資源最薄弱:感測器數據在傳輸至中央計算端之前即已被污染,後端的魯棒性增強機制無從發揮。
FGSM(Fast Gradient Sign Method)與 PGD(Projected Gradient Descent)是當前主流的感測器層對抗攻擊手法。IoT-23 資料集的實驗顯示,CNN 在對抗性攻擊下最為脆弱,而決策樹(Decision Tree)相對魯棒——這反映了模型複雜度與對抗性脆弱度的正相關關係。
寄生型態: 附生型(Epiphytic)——攻擊者附著於感測器信號邊界,在不破壞硬體的前提下持續注入語義擾動。
3.1.3 防禦策略(SEBG-1)
- EdgeShield 架構: 將對抗性噪聲偵測任務從中央計算下放至邊緣設備,在源頭過濾污染數據,降低通訊開銷。
- 感測器數據完整性簽章: 對感測器輸出加上加密時間戳記與雜湊鏈,任何傳輸途中的篡改均可偵測。
- 多模態交叉驗證: 影像與語音感測器的輸出互相交叉驗證,單一感測器被污染時系統仍保持魯棒性。
- Federated Learning 隔離訓練: 在各邊緣節點本地訓練偵測器,避免原始感測器數據上傳至中央,降低大規模資料外洩風險。
環節二:資料加工——從原始資料到 Token 進入 LLM
3.2.1 技術特徵
原始感測器資料(影像像素矩陣、語音波形)需經過特徵萃取、預處理、標準化後,再透過 Tokenizer 轉換為 LLM 可處理的離散 Token 序列。這一過程涉及:文字清洗、分詞(word segmentation)、BPE/WordPiece/SentencePiece 等次詞(subword)演算法、以及 Token ID 映射。
3.2.2 寄生型態與威脅
Tokenization 本身即為一個語義壓縮過程,存在固有的資訊損失(information loss)。攻擊者可利用以下方式在此環節注入寄生信息:
- Unicode 同形字攻擊(Homoglyph Attack): 以視覺相似但字碼不同的 Unicode 字符替換原文,Tokenizer 產生不同的 Token 序列,繞過關鍵字過濾。
- Token 邊界操控: 在特定位置插入不可見字符(零寬度空格 U+200B 等),改變分詞邊界,影響模型語義理解。
- 多語言混合注入(Code-switching Attack): 混合使用不同語言的字符,觸發 Tokenizer 的異常分詞路徑。
寄生型態: 寄生型(Parasitic)——攻擊者在語義編碼前即已篡改原始語義。
3.2.3 防禦策略(SEBG-2)
- Tokenization 前正規化管道: 實施 Unicode 正規化(NFC/NFKC),過濾不可見及控制字符。
- Token 熵異常偵測: 計算輸入 Token 序列的局部熵分佈,異常熵峰值(entropy spike)可能指示注入攻擊。
- 敏感資料標記化(Tokenization for PII Redaction): 在 Token 化前對個人識別資訊(PII)進行遮蔽,防止其進入模型處理管道。
環節三:LLM 預訓練——模型先驗的形成
3.3.1 技術特徵
預訓練(Pre-training)階段決定了 LLM 的「世界觀」——模型透過數千億 Token 的語言建模任務學習語言先驗(language prior)。這一階段形成了 Bayesian Predictive Coding 框架所說的「高層級先驗(high-level prior)」,後續的 Fine-tuning 和推論都在此先驗之上進行貝葉斯更新。
3.3.2 寄生型態與威脅
訓練資料投毒(Training Data Poisoning)是本環節最核心的資安威脅,也是最難偵測的。研究已將 LLM 毒化攻擊在四個維度上建立統一威脅模型:毒化集合(poison set)、觸發函數(trigger function)、毒化行為(poison behavior)、部署模式(deployment mode)。
LLM 毒化觸發器已從傳統的詞彙型(lexical)擴展至語義型(semantic)、行為型(behavioral)甚至系統層級(system-level),使得偵測極為困難。更危險的是「Sleeper Agents」——訓練時植入的欺騙性行為,即使經過安全訓練(Safety Training)仍能持續存在。
「供應鏈攻擊」(Supply Chain Attack)在此環節尤為顯著:LLM 供應鏈涵蓋訓練資料來源、預訓練模型(base model)、微調方法(LoRA/PEFT)及部署平台,任一環節的汙染均可傳播至下游系統。以 Meta Llama 2 為例,研究者曾發現 GitHub 及 Hugging Face 上大量未保護的 API token,可對 723 個帳戶獲得寫入權限,從而直接操控訓練資料庫。
寄生型態: 偽共生型(Pseudo-symbiotic)——惡意預訓練資料來源偽裝為合法語料供應者。
3.3.3 防禦策略(SEBG-3)
- 訓練資料血統追蹤(Data Provenance Tracking): 對所有訓練資料建立不可篡改的來源記錄。
- 統計一致性審計: 在訓練資料中偵測語義分佈異常——毒化資料在向量空間中往往形成可識別的離群群集。
- 差分隱私訓練(Differential Privacy): 在梯度更新中加入校準雜訊,限制單一訓練樣本對模型參數的影響程度。
- 模型行為基準測試: 定期以固定測試套件評估模型,偵測異常行為觸發器(backdoor triggers)。
環節四:Context 輸入參考——RAG、Prompt 與外部工具
3.4.1 技術特徵
現代 LLM 系統通常整合檢索增強生成(RAG, Retrieval-Augmented Generation)、System Prompt、用戶輸入、以及外部工具調用(function calling/tool use)。這些外部信息作為「運行時先驗更新」,在 Bayesian Predictive Coding 框架下相當於感測器輸入,會即時修正模型的預測分佈。
3.4.2 寄生型態與威脅
Prompt Injection 是本環節最嚴重且最活躍的攻擊向量,被 OWASP LLM Top 10 2025 評為第一位最高風險項目。攻擊者在用戶輸入或外部文件中插入隱藏指令,劫持模型行為。
間接 Prompt Injection(Indirect Prompt Injection)更為隱蔽:攻擊者在 RAG 系統所引用的外部文件(網頁、資料庫條目)中預先埋入惡意指令,當 LLM 在處理這些文件時被觸發,執行攻擊者意圖而非用戶意圖。
System Prompt 洩漏(System Prompt Leakage,OWASP LLM07:2025)允許攻擊者推斷系統的防禦配置,進而設計針對性的繞過策略。
向量資料庫與嵌入弱點(Vector and Embedding Weakness,OWASP LLM08:2025)則使得 RAG 系統的知識庫成為新的攻擊面——向量近鄰搜尋可被操控以返回惡意文件。
寄生型態: 寄生型(Parasitic)——攻擊者透過 Prompt 管道直接劫持模型執行流。
3.4.3 防禦策略(SEBG-4)
- 輸入/輸出策略強制執行(Policy Enforcement): 在輸入層與輸出層部署語義過濾器,偵測偏離預期分佈的指令模式。
- Context 隔離(Context Isolation): System Prompt 與用戶輸入在語義空間中明確隔離,防止越界覆蓋。
- 最小權限工具調用(Least-Privilege Tool Use): 外部工具調用僅授予完成特定任務所需的最低權限。
- RAG 文件完整性驗證: 對 RAG 引用文件建立哈希驗證機制,偵測外部文件的後期篡改。
環節五:生成數據的操作——推論與採樣
3.5.1 技術特徵
LLM 的生成過程是一個自迴歸採樣過程:在每一步,模型根據當前 context 輸出 Vocabulary 上的機率分佈,再透過採樣策略(貪婪搜尋、Top-k、Top-p/Nucleus、Temperature Sampling)選取下一個 Token。這一過程中,模型參數固定,但採樣的隨機性引入了生成熵(Generation Entropy)。
3.5.2 寄生型態與威脅
推論時的 DoS 攻擊(Inference-time DoS): 透過設計「海綿樣本(Sponge Samples)」,使模型在推論時產生異常長的輸出序列,消耗大量 GPU 資源,導致服務可用性下降。研究已提出「毒化型 DoS(P-DoS)」攻擊:在微調階段植入毒化樣本,突破推論時序列長度的上限,在伺服器成本、延遲及 GPU 資源浪費方面造成系統性損耗。
幻覺(Hallucination)作為系統性風險: 模型在高語義熵場景下(知識邊界模糊、問題歧義性高)容易產生高置信度的錯誤輸出,被 OWASP 歸類為 LLM09:2025「錯誤資訊(Misinformation)」風險。
生成過程的不確定性量化研究顯示:Attention Head Entropy 可作為模型生成正確性的預測指標——特定 attention heads 在模型生成正確答案時呈現明顯不同的熵模式,可用於即時可靠性評估。
寄生型態: 寄生型(Parasitic)——推論時 DoS 攻擊寄生於計算資源;幻覺風險則是系統內生的熵失控。
3.5.3 防禦策略(SEBG-5)
- 動態序列長度限制: 結合上下文語義複雜度,動態調整最大生成長度,防禦 DoS 毒化攻擊。
- 語義熵即時監控: 在生成過程中持續監控 token-level Shannon entropy,異常峰值觸發輸出審查。
- Attention Head Entropy 可靠性評估: 部署輕量化的 entropy-based 可靠性評估器,在輸出前預測答案品質。
- 水印嵌入(Watermarking): 在低熵 Token 生成時嵌入可偵測的統計水印,以追蹤 AI 生成內容的來源。
環節六:數據輸出——API、介面與下游系統
3.6.1 技術特徵
LLM 的輸出透過 API 端點、聊天介面、代碼執行環境等管道傳遞至用戶或下游系統。在 Agentic AI 架構中,LLM 的輸出可直接觸發工具調用、代碼執行、資料庫寫入等高影響力操作,使輸出環節的資安風險顯著放大。
3.6.2 寄生型態與威脅
敏感資訊洩漏(Sensitive Information Disclosure,OWASP LLM02:2025): LLM 可能透過生成文本洩漏訓練資料中的敏感個人資訊、商業機密或系統配置。這是「記憶萃取攻擊(Memorization Extraction Attack)」的核心機制。
不安全的輸出處理(Insecure Output Handling): LLM 輸出若未經消毒(sanitization)直接傳遞至下游系統(如 SQL 引擎、Shell 執行器、HTML 渲染器),可能觸發 XSS、SQL Injection、命令注入等傳統攻擊。
過度自主行為(Excessive Agency,OWASP LLM08:2025): 在 Agentic 架構中,LLM 被賦予過大的執行權限,一旦受到 Prompt Injection 劫持,可觸發高危操作。
寄生型態: 附生型(Epiphytic)——攻擊者透過輸出管道附著於信息流,汲取敏感數據或植入惡意動作。
3.6.3 防禦策略(SEBG-6)
- 輸出消毒與結構化: 對 LLM 輸出實施嚴格的結構化解析,禁止將原始文本直接傳遞給可執行環境。
- 出口過濾(Egress Filtering): 在 API 網關部署語義分類器,偵測輸出中的 PII、機密信息模式。
- 最小執行環境(Sandbox): 所有 AI 生成的可執行代碼在沙盒環境中運行,限制系統調用與網絡訪問。
- 速率限制與異常行為監控: 偵測異常的 API 調用模式(如大量提取請求),觸發人工審查。
環節七:系統性整合——各環節的資安疊加效應
前六個環節的資安風險並非獨立存在,而是透過信息流的鏈式傳遞形成「熵疊加效應(Entropy Accumulation Effect)」:感測器層的對抗性擾動可在 Tokenization 後放大;毒化訓練資料形成的後門可被 Prompt Injection 激活;輸出洩漏可為下一輪 Prompt 攻擊提供情資。
資安研究者已注意到 Agentic AI 系統的「擴展攻擊面(Expanded Attack Surface)」問題:LLM 可調用外部工具、存取資料庫、與其他 AI 模型交互,使得一次成功的 Prompt Injection 可觸發一系列高影響力的連鎖操作。OWASP LLM 供應鏈風險(LLM03:2025)亦強調,從訓練資料到部署平台的完整供應鏈均是威脅面。
四、資訊寄生熵系統的防禦架構整合
4.1 語義熵預算閘門(SEBG)全鏈路框架
整合前述七個環節的防禦策略,本文提出「語義熵預算閘門(Semantic Entropy Budget Gate, SEBG)」全鏈路框架。其核心思想是:在信息流的每個轉換節點,設置基於資訊熵閾值的監控與過濾機制,確保系統在每個環節的語義熵增量控制在安全預算(Security Entropy Budget)之內。
SEBG 框架的數學直觀如下:設 H_threshold 為各環節的語義熵安全閾值,若某環節的實際語義熵增量 ΔH > H_threshold,則觸發隔離與審查程序。H_threshold 依據系統的安全等級與業務情境動態調整,例如醫療或金融場景需設置更嚴格的閾值。
4.2 寄生型態與防禦策略對應
| 寄生型態 | 主要環節 | 代表攻擊向量 | SEBG 防禦機制 |
|---|---|---|---|
| 寄生型(Parasitic) | 環節 2、3、4 | 資料投毒、Prompt Injection、訓練後門 | 語義熵過濾、Context 隔離、差分隱私 |
| 附生型(Epiphytic) | 環節 1、6 | 對抗性感測器攻擊、記憶萃取、輸出洩漏 | EdgeShield、出口過濾、沙盒執行 |
| 偽共生型(Pseudo-symbiotic) | 環節 3、4 | 惡意供應鏈、RAG 知識庫投毒 | 資料血統追蹤、文件完整性驗證 |
4.3 Bayesian Predictive Coding 框架的防禦意涵
從 Hierarchical Predictive Processing(HPP)的視角看,LLM 的推論過程是一個貝葉斯後驗更新過程:高層先驗(訓練形成)與低層感測器輸入(Prompt/Context)持續互動,以最小化預測誤差。這一框架揭示:
- 訓練投毒攻擊本質上是對「高層先驗」的惡意修改,影響所有後續推論。
- Prompt Injection 是對「低層感測器輸入」的惡意覆蓋,試圖繞過高層先驗的約束。
- 防禦的關鍵是確保貝葉斯更新過程的「先驗完整性」與「輸入可信性」兩者均受保護。
五、結論與研究展望
本文以「第一性原理」視角,從物理電能到語意 Token 的完整信息鏈路出發,系統識別 AI 系統七個環節的資安威脅,並以「資訊寄生熵系統」理論框架統一詮釋攻擊機制與防禦策略。主要學術貢獻包括:
- 提出七環節全鏈路 SEBG 防禦框架,以語義熵作為統一的風險量化工具。
- 以寄生(Parasitic)、附生(Epiphytic)、偽共生(Pseudo-symbiotic)三種生態型態對 OWASP LLM Top 10 2025 的攻擊向量進行系統性重新分類。
- 將 Bayesian Predictive Coding 框架與 AI 資安分析整合,揭示攻擊行為在神經計算語義層的本質。
- 將 Shannon 資訊熵、語義熵(Semantic Entropy)、Kernel Language Entropy(KLE)引入 AI 資安防禦的量化分析工具箱。
未來研究方向包括:SEBG 各環節閾值的自適應學習機制、多 Agent 系統的熵疊加建模、以及基於資訊理論的 AI 資安合規評估框架(參照 EU AI Act 的風險分級要求)。
資訊寄生是 AI 時代的新型威脅範式。理解其熵動力學,是構建下一代 AI 資安防禦體系的理論基石。
參考文獻 References
- OWASP Foundation. (2025). OWASP Top 10 for Large Language Model Applications 2025. https://genai.owasp.org/
- Kossen, J., et al. (2024). Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation. ICLR 2024.
- Nikitin, A., et al. (2024). Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs. arXiv:2405.20003.
- Jha, N. K., & Reagen, B. (2025). Entropy-Guided Attention for Private LLMs. arXiv:2501.03489. NYU Center for Cybersecurity.
- Fendley, N., et al. (2025). Poisoning Attacks on LLMs: A Unified Threat Model. arXiv (June 2025).
- Hubinger, E., et al. (2024). Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv:2401.05566.
- Zhang, Y., et al. (2024). Persistent Pre-Training Poisoning of LLMs. arXiv:2410.13722.
- Ghaffari, A., et al. (2025). AI-Enabled IoT Security: A Survey on Advances, Challenges, and Cross-Domain Collaborative Frameworks. Proceedings of CISAI 2025. ACM. https://doi.org/10.1145/3773365.3773619
- EdgeShield Team. (2024). EdgeShield: A Universal and Efficient Edge Computing Framework for Robust AI. arXiv:2408.04181.
- Shumailov, I., et al. (2021). Sponge Examples: Energy-Latency Attacks on Neural Networks. IEEE EuroS&P 2021.
- Chen, Z., et al. (2024). AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases. arXiv:2407.12784.
- Wan, A., Wallace, E., Shen, S., & Klein, D. (2023). Poisoning Language Models During Instruction Tuning. arXiv:2305.00944.
- Yao, Y., et al. (2024). Survey on Security and Privacy Risks of Large Language Models. IEEE Transactions on Knowledge and Data Engineering.
- Open Source Security Foundation (OpenSSF). (2025, January). Predictions for Open Source Security in 2025: AI, State Actors, and Supply Chains. https://openssf.org/blog/2025/01/23/
- Indusface. (2025). OWASP LLM04:2025 Data and Model Poisoning. https://www.indusface.com/learning/owasp-llm-data-and-model-poisoning/
- Bryce, R., et al. (2024). Comprehensive Survey on Privacy and Security Trade-offs in LLM Applications. arXiv.
- European Union. (2024). Artificial Intelligence Act (EU AI Act). Official Journal of the European Union.
- Brundage, M., et al. (2018). The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation. arXiv:1802.07228.
- Xu, J., et al. (2024). Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for LLMs. arXiv:2305.14710.
- Mdpi. (2025). Adversarial Attacks in IoT: A Performance Assessment of ML and DL Models. Engineering Proceedings, 112(1), 15. https://doi.org/10.3390/engproc2025112015
- Frontiers in the Internet of Things. (2025). Securing the future: AI-driven cybersecurity in the age of autonomous IoT. https://doi.org/10.3389/friot.2025.1658273
- PubMed / MDPI Sensors. (2025). Machine Learning-Based Security Solutions for IoT Networks: A Comprehensive Survey. Sensors, 25(11), 3341. https://doi.org/10.3390/s25113341
- Lu, J., et al. (2025). Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking. arXiv:2505.14112.
- OpenReview. (2025). Attention Head Entropy of LLMs Predicts Answer Correctness. ICLR 2025 Workshop Submission. https://openreview.net/pdf/f1d8fabcdf88c7a7a6c264ffba2eef62e00a8f66.pdf
- Check Point Software. (2025). OWASP Top 10 for LLMs in 2025: Data and Model Poisoning. https://www.checkpoint.com/cyber-hub/what-is-llm-security/data-and-model-poisoning/




發表迴響