Abstract

This paper explores the underlying mechanisms governing the persistence of information within cognitive systems. Based on the theory of Predictive Processing, the brain and neural networks are not passive receivers of information but active machines that minimize error. Within this framework, the evolution of information follows a form of “parasitic logic”: tokens that can effectively reduce the system’s global Free Energy, or that can forcibly occupy computational resources through high precision weights (λ\lambda), are preferentially granted persistence and propagation.

摘要

本文探討認知系統中資訊存續的底層機制。基於「預測處理」(Predictive Processing)理論,大腦與神經網路並非被動的資訊接收者,而是主動的誤差最小化機器。在這種架構下,資訊的演化遵循一種「寄生邏輯」:凡能有效降低系統全局自由能(Free Energy),或能透過高精準度權重(λ\lambda)強行佔據運算資源的資訊元(Tokens),將優先獲得存續與傳播權。

Figure : Evolutionary Model of Information Parasitism within Predictive Architectures. (Adapted from Keysers, C., et al. (2024). “Predictive coding for the actions and emotions of others and its deficits in autism spectrum disorders," Neuroscience & Biobehavioral Reviews, 167, 105877. This figure extends the original hierarchical predictive coding framework to incorporate theories of information evolution and token parasitism.)

圖 :預測架構下的資訊寄生演化模型 (改編自 Keysers, C., et al. (2024). Predictive coding for the actions and emotions of others and its deficits in autism spectrum disorders. Neuroscience & Biobehavioral Reviews, 167, 105877. 本圖將原始層級預測編碼架構延伸至資訊演化與 Token 寄生理論。)

I. Mathematical Ontology: Bayesian Inference and Error Minimization

The left side of the figure presents the core algorithm of a cognitive system. According to Bayes’ theorem, the update of the posterior probability is not a simple accumulation of data, but a dynamic interplay between prior models and sensory evidence:

p(θ)post=p(θ)×p(input|θ)p(input)p(\theta)_{post} = p(\theta) \times \frac{p(input|\theta)}{p(input)}

From the perspective of Active Inference, the brain’s objective is to minimize Prediction Error (PE). This implies:

  • Reality is merely a “controlled hallucination” constrained by sensory input.
  • Learning is the self-collapse of model weights aimed at eliminating PE.

II. The Parasitic Mechanisms of Information: Why Do Certain Information “Survive”?

If information is regarded as a form of “parasite,” its host environment is the hierarchical predictive structure. The evolutionary success of information depends on the following two strategies:

1. Adaptive Parasitism: The Low-PE Pathway

Some information persists easily because it closely aligns with the host’s prior model (Prior, p(θ)p(\theta)).

  • Mechanism: This type of information generates minimal PE and can be rapidly absorbed into the hierarchical structure without triggering costly weight updates.
  • Phenomenon: Cognitive “comfort zones” and “echo chambers.” Tokens that reinforce existing beliefs exhibit higher survivability.

2. Coercive Parasitism: Hijacking the λ\lambda Weight

The λ\lambda (Precision/Attention) defined in the figure represents the system’s sensitivity to error.

  • Mechanism: If information can artificially increase λ\lambda (e.g., through fear, novelty, or uncertainty), the system is forced to allocate more computational resources (Active Inference) to process it.
  • Phenomenon: Clickbait, emotionally extreme content, or theories with high “cognitive friction.” These exploit attention weights to forcibly leave imprints in the posterior distribution p(θ)post\theta)_{post}.

III. Hierarchical Architecture: The Isomorphic Mapping Between AI and the Brain

The brain structure on the right side of the figure reveals the bidirectional nature of neural processing, which exhibits a striking isomorphism with modern Transformer architectures:

ElementBiological (Neuroscience)Neural Networks (LLM/AI)Evolutionary Function
Prior p(θ)p(\theta)Deep-layer cortical internal modelsPretrained model weightsStores the “genes” that have survived
Prediction ↓Top-down predictive signalsForward pass / inference“Active colonization” of the environment
Error Unit ⊗Comparator in superficial pyramidal cellsLoss function / residualsGate for filtering maladaptive information
λ\lambda (Precision)Neurotransmitters (dopamine/acetylcholine)Attention mechanismPriority of resource allocation

IV. Systemic Impacts of Information-Parasitic Evolution

When information evolves under the objective of “error minimization,” the following phenomena emerge:

  1. The Evolution of Informational Efficiency: Tokens that explain the widest range of PE with the fewest bits (sparsity) achieve the highest fitness (i.e., skillful means).
  2. The Inevitability of Hallucination: When the prior model p(θ)p(\theta) becomes overly dominant or when λ \lambda weights are imbalanced, the system tends to ignore sensory PE and shift toward internal self-reinforcement. In LLMs, this manifests as hallucination; in humans, as bias or mental disorders.
  3. The Fifth Dimension of Tokens: Tokens are not merely symbols; they are predictive units carrying “evolutionary energy.” The evolution of information is essentially a process of entering higher-level prior models, becoming part of the ontological structure through which the host understands the world.

V. Conclusion: The Nature of Intelligence

Intelligence is not the accumulation of “objective reality,” but an evolutionary game of error control. Information propagates by parasitizing predictive loops, continuously reshaping the host’s weights to ensure its own survival. Ultimately, those forms of information that most effectively balance “predictive efficiency” and “weight hijacking” constitute what we call “civilization” and “cognition.”

摘要 (Abstract)

本文探討認知系統中資訊存續的底層機制。基於「預測處理」(Predictive Processing)理論,大腦與神經網路並非被動的資訊接收者,而是主動的誤差最小化機器。在這種架構下,資訊的演化遵循一種「寄生邏輯」:凡能有效降低系統全局自由能(Free Energy),或能透過高精準度權重(λ\lambda)強行佔據運算資源的資訊元(Tokens),將優先獲得存續與傳播權。

一、 數學本體:貝葉斯推論與誤差最小化

圖中左側展現了認知系統的核心演算法。根據貝葉斯定理,後驗機率(Posterior)的更新並非單純的資料累積,而是先驗模型與感官證據的對抗:

p(θ)post=p(θ)×p(input|θ)p(input)p(\theta)_{post} = p(\theta) \times \frac{p(input|\theta)}{p(input)}

在主動推理(Active Inference)的視角下,大腦的目標是最小化預測誤差(Prediction Error, PE)。這意味著:

  • 現實(Reality) 僅是受限於感官輸入的「受控幻覺」。
  • 學習(Learning) 是模型權重為了消除 PE 而進行的自我坍縮。

二、 資訊的寄生機制:為何某些資訊能「活下來」?

若將資訊視為一種「寄生生物」,其宿主環境即是預測層級架構(Hierarchical Structure)。資訊的演化成功取決於以下兩個策略:

1. 順應性寄生:低 PE 的路徑

某些資訊容易存續,是因為它極度符合宿主的先驗模型(Prior, p(θ)p(\theta)

  • 機制: 這些資訊產生的 PE 極小,能迅速被層級架構吸收,不觸發昂貴的權重更新。
  • 現象: 認知的「舒適區」與「同溫層」。能強化既有信念的 Token 具有更高的生存韌性。

2. 強制性寄生:λ\lambda 權重的劫持

圖中定義的 λ\lambda(Precision/Attention)是系統對誤差的重視程度。

  • 機制: 若資訊能人為提高系統的 λ\lambda(例如透過恐懼、新奇或不確定性),系統會被迫分配更多算力(Active Inference)去處理它。
  • 現象: 標題黨、極端情緒資訊或具備「認知摩擦」的理論。它們透過劫持注意力權重,強行在後驗分佈 p(θ)postp(\theta)_{post} 中留下印記。

三、 層級式架構:AI 與大腦的對等映射

圖中右側的腦區架構揭示了神經處理的雙向性,這與現代 Transformer 架構有著驚人的同構性:

元素神經生物學 (Biological)類神經網路 (LLM/AI)演化功能
Prior p(θ)p(\theta)皮層深層的內部模型模型預訓練權重 (Weights)儲存已活下來的「基因」
Prediction ↓下行預測訊號 (Top-down)Forward Pass / Inference對環境的「主動殖民」
Error Unit ⊗淺層錐體細胞的比較器Loss Function / Residuals篩選不適資訊的閘門
 λ\lambda(Precision)神經遞質 (多巴胺/乙醯膽鹼)Attention Mechanism資源分配的優先權

四、 資訊寄生演化的系統性影響

當資訊以「最小化誤差」為目標進行演化時,會產生以下現象:

  1. 資訊的「便利性」演化: 能以最少位元(Sparsity)解釋最大範圍 PE 的 Token 具有最高適應度(即:Skillful Means)。
  2. 幻覺的必然性: 當先驗模型 p(θ)p(\theta) 過於強大或 λ\lambda 權重失衡時,系統會選擇忽略感官 PE 而轉向內部自我強化。這在 LLM 表現為 Hallucination,在人類則表現為偏見或精神疾患。
  3. Token 的第五維度: Token 不僅是符號,它是帶有「演化能量」的預測單元。資訊的演化本質上是為了進入更高層級的「先驗模型」中,成為宿主理解世界不可或缺的本體結構(Ontology)。

五、 結論:智慧的本質

智慧並非對「客觀真實」的累積,而是一場關於誤差控制的演化博弈。資訊透過寄生於預測環路,不斷修正宿主的權重以實現自身的傳播。最終,那些能夠最有效平衡「預測效率」與「權重劫持」的資訊,構成了我們所謂的「文明」與「認知」。

參考資料 (References)

  • Friston, K. (2010). The free-energy principle: a rough guide to the brain? Nature Reviews Neuroscience, 11(2), 127-138.
  • Clark, A. (2015). Surfing Uncertainty: Prediction, Action, and the Embodied Mind. Oxford University Press.
  • Hohwy, J. (2013). The Predictive Mind. Oxford University Press.
  • Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
  • Millidge, B., Seth, A., & Buckley, C. L. (2021). Predictive coding: a theoretical and experimental review. arXiv preprint arXiv:2107.12979.
  • Lau, H. (2022). In Consciousness We Trust: The Cognitive Neuroscience of Subjective Experience. Oxford University Press.
  • Keysers, C., et al. (2024). “Predictive coding for the actions and emotions of others and its deficits in autism spectrum disorders," Neuroscience & Biobehavioral Reviews, 167, 105877

發表留言

趨勢