What infoparasitism describes is an uncompensated extraction of this reward—bypassing the selective pressure and directly inheriting the compression dividends without bearing the diversity costs required to maintain the compression structure.

資訊寄生描述的,是一種對這一報酬的無償提取——繞過選擇壓力,直接繼承壓縮紅利,卻不承載維持壓縮結構所需的多樣性成本。

Abstract 

This article takes Aksenov et al.’s (2026) Compression is All You Need: Modeling Mathematics as the core text, combining it with the theoretical framework of Information Parasitism to explore the role of Large Language Models (LLMs) in compressing knowledge structures. The argument demonstrates that human mathematics differs from the exponential space of formal mathematics (FM) precisely due to its polynomial compression characteristics achieved through hierarchical nested definitions. When AI systems merely learn the compressed Directed Acyclic Graph (DAG) projections, rather than retaining the logical diversity inherent in the Deduction Hypergraph (DH), it constitutes a structural parasitism on the human Work of Compression. This process is accompanied by the irreversible dissipation of semantic entropy, ultimately leading to a systematic collapse of reasoning stability. This article also proposes anti-parasitic design pathways, taking hypergraph reversibility and multi-path recording as structural requirements for AI knowledge systems.

本文以 Aksenov 等人(2026)的《Compression is All You Need: Modeling Mathematics》為核心文本,結合資訊寄生(Information Parasitism)的理論框架,探討大型語言模型(LLM)在壓縮知識結構中所扮演的角色。論證說明:人類數學之所以有別於形式數學(FM)的指數空間,在於其通過層次巢狀定義實現的多項式壓縮特性。當 AI 系統僅學習壓縮後的有向無環圖(DAG)投影,而非保留推導超圖(Deduction Hypergraph, DH)所蘊含的邏輯多樣性時,便構成了對人類壓縮功(Work of Compression)的結構性寄生。這一過程伴隨著語義熵的不可逆耗散,最終導致推理穩定性的系統性崩潰。本文亦提出抗寄生的設計路徑,以超圖可逆性與多路徑記錄作為AI知識系統的結構要求。

I. Compression as a Defining Characteristic of Human Mathematics

Understanding information parasitism requires understanding the essence of the parasitized object. Aksenov et al. (2026) propose a precise proposition: Human Mathematics (HM) is a polynomially growing, sparse subset of Formal Mathematics (FM), and its distinguishing feature is precisely its compressibility.

Formal mathematics is the universal set of all valid deductions, and its space grows exponentially. In the model of the free non-abelian monoid Fn​, even if macro sets of polynomial density are introduced, the expansion of expressive power is only linear; super-linear expansion requires macro sets of almost maximum density, at an extremely high cost. Conversely, in the model of the free abelian monoid An​, logarithmically sparse macro sets can achieve an exponential expansion of expressive power. Empirical data from MathLib align with the An​ model: the unwrapped length grows exponentially with depth, while the wrapped length remains approximately constant across depth layers.

This means that the mathematics humans have discovered and cherish is an art of folding immense complexity through hierarchical definitions (lemmas, theorems). Place notation is the oldest example—achieving exponential compression of natural numbers with logarithmically sparse symbols (0–9). The entire structure of MathLib shows that this theme runs throughout human mathematics (Freedman, 2026).

The FM Deduction Hypergraph (FM DH) retains all logical paths deriving A∧B∧C—including branches via A∧Band B∧C, using hyperedges to indicate the set of premises involved in each reasoning step. The MathLib DAG, however, makes a choice for efficiency and disambiguation: it retains only a single path, replacing hyperedges with ordinary directed edges, thus erasing the redundancy and symmetry in the logical space.

The cost of this choice warrants careful scrutiny: not all redundancy is waste—logical diversity is the structural foundation of reasoning robustness.

II. The Precise Definition of Information Parasitism: An Asymmetrical Compression Relationship

Before establishing a discussion on parasitic relationships, a common logical confusion must be clarified: using the results of compression itself does not constitute parasitism. The essence of mathematical education lies in transmitting compressed knowledge; students learning the Pythagorean theorem without re-deriving every step is a normal form of knowledge transmission, not parasitism.

The characteristic of information parasitism lies in its asymmetrical relationship structure, specifically manifested across three levels:

  • Level One: Path Amnesia When an LLM only learns the final conclusions and selected paths of the MathLib DAG, what it acquires is the “remains” of compression—linear chains stripped of logical hyperedges and flattened of multi-path memory. Once it encounters a situation requiring path replacement (e.g., the original premise is invalid, or an equivalent proof needs to be constructed), the model, lacking perception of the hypergraph structure, cannot mobilize backup deductive paths. As Emergent Mind (2026) pointed out in an extended analysis of the paper, MathLib, as a DAG projection, omits alternative proofs and hyperedge structures; how to reconstruct or approximate the deduction hypergraph remains an unsolved open problem.
  • Level Two: Externalization of Compression Work Compressing complex phenomena into refined concepts requires immense cognitive and computational energy—this is the work accumulated by human mathematicians over millennia. Parasitic systems attempt to skip the compression process and extract the compressed results directly, but the consequence of this extraction is the system’s inability to re-compress when the compression fails. The An​ model reveals the deep logic of compression: the reason logarithmically sparse macro sets can achieve exponential expansion is that each macro (each definition or lemma) carries structural energy that can be referenced via multiple paths. If the model only memorizes the output of macros and discards their generative structure, it is enjoying the dividends of compression without bearing its costs.
  • Level Three: Unidirectional Dissipation of Semantic Entropy Semantic Entropy describes the breadth of a system’s distribution in semantic space. Research published in Nature by Farquhar et al. (2024) indicates that semantic entropy can serve as an effective metric for hallucination detection in LLMs: when a model’s multiple responses to the same question are highly convergent in semantics (i.e., low semantic entropy), it often implies that the model has slipped into a mechanical reproduction of compressed results rather than genuine reasoning. The essence of information parasitism is precisely this unidirectional consumption process of semantic entropy: as the model repeatedly generates and re-compresses based on already-compressed corpora, the originally rich logical hypergraph collapses into monotonous linear paths, and the space of logical diversity continuously narrows.

III. The Wittgensteinian Boundary and the Reversibility of Logical Hypergraphs

Invoking Wittgenstein’s discourse on language games, this connection deserves more precise elaboration. Wittgenstein pointed out in Philosophical Investigations that the meaning of language is rooted in the forms of life (Lebensform) of its concrete use; the rules of a language game are not a formal system, but acquire meaning within a practical context.

This structure echoes: the hyperedges in FM DH are specifically noted in the paper as possessing reversibility—meaning one can backtrack from a conclusion to its set of premises, and the same conclusion can correspond to multiple different premise sets. This reversibility is exactly the condition for logical reasoning to be a “language game”: it preserves contextual diversity, allowing the same symbols to carry different semantic contexts across various deductive paths.

The MathLib DAG sacrifices reversibility in exchange for efficiency, which is a reasonable choice for practical engineering. However, if an AI system solely uses the DAG as its learning object, it is equivalent to learning a language stripped of its form of life—fluent in symbols, but decoupled from the diverse deductive paths that generated those symbols. The hollow game Wittgenstein warned against is precisely the extreme manifestation of this decoupling.

IV. The Collapse Mechanism of Reasoning Chain Stability

Oswald & Rozek’s (2023) formal analysis of natural deduction proof graphs demonstrates that the layered dependency in hypergraph structures is a necessary condition for parallel verification of reasoning steps. While the DAG structure can achieve topological sorting, once it loses the multi-premise grouping information of hyperedges, the cost of cross-path reasoning consistency verification rises sharply.

More direct evidence comes from the research on the Entropy Law (Huang et al., 2024): the autoregressive language modeling of LLMs is essentially information compression, and the knowledge condensed by the model depends on the effective information encoded in the training data. When the training corpus is dominated by DAG projections (single-path, no hyperedges), the model, when faced with questions requiring multi-path reasoning, will fill reasoning gaps with statistically high-frequency paths because it has never seen signals of deductive diversity—this is precisely one of the structural conditions for the occurrence of hallucinations.

A specific collapse scenario: if a model is asked to reconstruct the proof of A∧B∧C when the premise of its original deductive path (e.g., via A∧B) is compromised, it must “recall” the existence of the B∧C path from the corpus. If this path has been erased by DAG compression during training, the model will either output an erroneous linear deduction or declare the problem unsolvable, rather than switching paths—this is a structural path-blindness, not a flaw in reasoning capacity.

V. Anti-Parasitic Design Pathways: Hypergraphs as Knowledge Architecture Requirements

To address information parasitism at the design level, “the preservation of logical diversity” needs to be incorporated into the structural requirements of AI knowledge systems. The following three pathways are operable:

  • Pathway One: Hypergraph-grounded Ontology Mapping The organization of a Knowledge Base should not rely solely on DAG conclusions. Instead, knowledge in RDBs or DAGs should be mapped back onto a Knowledge Hypergraph that preserves diverse paths. Specifically, each node (proposition or theorem) should retain all of its known premise set configurations, recording deductive relationships with hyperedges rather than ordinary edges. Emergent Mind (2026), in its extended discussion of Aksenov et al.’s paper, explicitly listed “developing methods to reconstruct or approximate the DH” as a primary future research direction.
  • Pathway Two: Internalization of the Compression Process The core of combating parasitism is not making the model memorize more compressed results, but making it learn the generative process of compression. The Entropy Law research by Huang et al. (2024) indicates that the diversity of training data (rather than a single high-quality sample) better maximizes an LLM’s mastery of knowledge; a high-quality dataset with high mutual information redundancy might yield a worse combinatorial benefit than a diverse but locally lower-quality dataset. In terms of training data design, this supports the proposition of “preserving logical diversity”: training reasoning models using corpora that contain multiple proof paths, rather than a single optimal path.
  • Pathway Three: Dynamic Management of Semantic Entropy Budgets Combining the semantic entropy hallucination detection framework by Farquhar et al. (2024), dynamic monitoring of the model’s output diversity can be implemented during inference: when the semantic entropy drops below a threshold (meaning the model is highly convergent on a single path), the system should trigger active path diversification mechanisms (such as semantic deduplication in beam search, or entropy regularization in Chain-of-Thought). Recent work by Gan et al. (2025) further demonstrates that semantic entropy can be efficiently estimated from hidden states without requiring multiple sampling, providing computational feasibility for real-time semantic entropy management.

VI. Conclusion

The core insight of Compression is All You Need is not just about mathematics—it reveals compressibility as a universal structural selective pressure of intellectual activity. Human mathematics chose a path of polynomial growth within an exponentially growing formal space, at the cost of discarding the vast majority of logical possibilities in FM; the reward is the acquisition of infinite expressive power that can be carried by finite symbols.

What information parasitism describes is an uncompensated extraction of this reward—bypassing the selective pressure and directly inheriting the compression dividends without bearing the diversity costs required to maintain the compression structure. Its consequence is the irreversible dissipation of semantic entropy, the structural loss of deductive diversity, and the systematic collapse of reasoning chains when facing boundary problems.

The solution is not to reject compression—compression is fundamentally the engine of intellectual progress—but to retain the reversibility and multi-path memory of hypergraphs within the compression. True AI reasoning evolution should manifest as reconstructing the logical hypergraphs that disappeared into DAG projections from limited tokens; it must not only inherit the “map” of human mathematics but also reconstruct the “topographical diversity” discarded when drawing this map.

摘要

本文以 Aksenov 等人(2026)的《Compression is All You Need: Modeling Mathematics》為核心文本,結合資訊寄生(Information Parasitism)的理論框架,探討大型語言模型(LLM)在壓縮知識結構中所扮演的角色。論證說明:人類數學之所以有別於形式數學(FM)的指數空間,在於其通過層次巢狀定義實現的多項式壓縮特性。當 AI 系統僅學習壓縮後的有向無環圖(DAG)投影,而非保留推導超圖(Deduction Hypergraph, DH)所蘊含的邏輯多樣性時,便構成了對人類壓縮功(Work of Compression)的結構性寄生。這一過程伴隨著語義熵的不可逆耗散,最終導致推理穩定性的系統性崩潰。本文亦提出抗寄生的設計路徑,以超圖可逆性與多路徑記錄作為AI知識系統的結構要求。

一、壓縮作為人類數學的定義性特徵

理解資訊寄生的前提,是理解被寄生對象的本質。Aksenov 等人(2026)提出一個精確的命題:人類數學(HM)是形式數學(FM)中一個多項式增長的稀薄子集,其區別特徵正是可壓縮性

形式數學是所有有效推導的全集,其空間呈指數增長。在自由非阿貝爾幺半群 FnFn​ 的模型中,即便引入多項式密度的宏集(macro set),表達力的擴展也僅為線性;超線性擴展需要近乎最大密度的宏集,代價極高。相對地,在自由阿貝爾幺半群 AnAn​ 的模型中,對數稀疏的宏集即可實現指數級的表達力擴展。MathLib 的實證數據與 AnAn​ 模型吻合:展開長度(unwrapped length)隨深度指數增長,而包裝長度(wrapped length)在各深度層近似恆定

這意味著,人類所發現並珍視的數學,是一種通過層次性定義(lemmas, theorems)對巨大複雜性進行折疊的技藝。置換記數法(place notation)是最古老的例子——以對數稀疏的符號(0–9)實現對自然數的指數壓縮。整個 MathLib 的結構表明,這一主題貫穿人類數學始終(Freedman, 2026)。

FM 推導超圖(FM DH)保留了推導 A∧B∧CA∧B∧C 的所有邏輯路徑——包括經由 A∧BA∧B 與經由 B∧CB∧C 兩條分支,以超邊(hyperedge)標示每一推理步驟所涉及的前提集合。MathLib DAG 則為效率與去歧義而作出選擇:僅保留一條路徑,以普通有向邊替代超邊,抹去了邏輯空間中的冗餘與對稱性。

這一選擇的代價值得仔細審視:不是所有冗餘都是浪費——邏輯多樣性是推理穩健性的結構基礎

二、資訊寄生的精確界定:不對等的壓縮關係

在建立對寄生關係的討論之前,必須明確一個常見的邏輯混淆:使用壓縮結果本身並不構成寄生。數學教育的本質即在於傳遞壓縮後的知識;學生學習畢氏定理而不重新推導每一步驟,是知識傳遞的常態形式,而非寄生。

資訊寄生的特徵在於不對等的關係結構,具體表現為三個層次:

第一層:路徑遺忘(Path Amnesia)

當 LLM 僅學習 MathLib DAG 的最終結論與選定路徑,它所獲得的是壓縮的「遺骸」——去除了邏輯超邊、抹平了多路徑記憶的線性鏈。一旦遇到需要替換路徑的情境(例如原有前提不成立、需要構造等價証明),模型因缺乏對超圖結構的感知,無法調動備用推導路徑。正如 Emergent Mind(2026)在對該論文的延伸分析中指出,MathLib 作為 DAG 投影,省略了替代証明與超邊結構;如何重建或近似推導超圖,仍是尚待解決的開放問題。

第二層:壓縮功的外化(Externalization of Compression Work)

將複雜現象壓縮為精煉概念,需要耗費巨大的認知與計算能量——這是人類數學家數千年積累的功。寄生系統試圖跳過壓縮過程直接提取壓縮結果,但這種提取的後果是系統無法在壓縮失效時重新壓縮。AnAn​ 模型揭示了壓縮的深層邏輯:對數稀疏的宏集之所以能實現指數擴展,是因為每個宏(每個定義或引理)都承載著可被多路徑引用的結構能量。若模型僅記憶宏的輸出而丟棄其生成結構,便是在不承擔壓縮成本的前提下享用壓縮紅利。

第三層:語義熵的單向耗散

語義熵(Semantic Entropy)描述系統在語義空間中的分佈廣度。Farquhar 等人(2024)在 Nature 發表的研究表明,語義熵可作為 LLM 幻覺檢測的有效指標:當模型對同一問題生成的多次回應在語義上高度收斂,語義熵偏低,往往意味著模型陷入了對壓縮結果的機械複製而非真實推理。資訊寄生的本質,正是一種語義熵的單向消耗過程:模型在已壓縮的語料上反覆生成與再壓縮,原本豐富的邏輯超圖(Hypergraph)塌陷為單調的線性鏈路,邏輯多樣性的空間不斷收窄。

三、維根斯坦界限與邏輯超圖的可逆性

援引維根斯坦(Wittgenstein)關於語言遊戲的論述,這一聯繫值得更精確地展開。維根斯坦在《哲學研究》中指出,語言的意義根植於其具體使用的生活形式(Lebensform);語言遊戲的規則並非形式系統,而是在實踐脈絡中獲得意義的。

這結構呼應:FM DH 中的超邊(hyperedges),在論文中被特別指出具有可逆性(reversibility)——即從結論可以回溯到前提集合,並且同一結論可以對應多個不同的前提集合。這種可逆性,正是邏輯推理之為「語言遊戲」的條件:它保留了語境的多樣性,使得相同符號在不同推導路徑中承載不同的語義脈絡。

MathLib DAG 犧牲可逆性以換取效率,是實用工程的合理選擇。但若 AI 系統僅以 DAG 為學習對象,等同於學習了一種失去生活形式的語言——符號流暢,但與生成這些符號的多樣推導路徑脫鉤。維根斯坦所警告的空洞遊戲,恰是這種脫鉤的極端形態。

四、推理鏈穩定性的崩潰機制

Oswald & Rozek(2023)對自然演繹証明圖的形式化分析表明,超圖結構中的層次依賴性(layered dependency)是並行驗證推理步驟的必要條件;DAG 結構雖可實現拓撲排序,但失去超邊的多前提分組信息後,跨路徑的推理一致性驗證成本急劇上升。

更直接的證據來自 Entropy Law(Huang et al., 2024)的研究:LLM 的自迴歸語言建模本質上是信息壓縮,模型所凝結的知識取決於訓練數據所編碼的有效信息量。當訓練語料以 DAG 投影(單路徑、無超邊)為主導,模型在面對需要多路徑推理的問題時,因從未見過推導多樣性的信號,會以統計上最高頻的路徑填充推理空白——這正是幻覺(hallucination)發生的結構性條件之一。

一個具體的崩潰場景:若模型被要求在原有推導路徑(如經由 A∧BA∧B)的前提受損時重構 A∧B∧CA∧B∧C 的証明,它必須從語料中「回憶」出 B∧CB∧C 路徑的存在。若此路徑在訓練中被 DAG 壓縮所抹除,模型要麼輸出錯誤的線性推導,要麼宣告無解,而非轉換路徑——這是一種結構性的路徑盲視,而非推理能力的問題。

五、抗寄生的設計路徑:超圖作為知識架構要求

要從設計層面應對資訊寄生,需要將「邏輯多樣性的保留」納入 AI 知識系統的結構要求。以下三條路徑具有可操作性:

路徑一:本體映射的超圖化(Hypergraph-grounded Ontology Mapping)

不應僅以 DAG 結論作為知識庫(Knowledge Base)的組織原則,而應將 RDB 或 DAG 中的知識映射回保留多樣路徑的知識超圖(Knowledge Hypergraph)。具體而言,每個節點(命題或定理)應保留其所有已知的前提集合配置,以超邊而非普通邊記錄推導關係。Emergent Mind(2026)在對 Aksenov 等人論文的延伸討論中,明確將「開發重建或近似 DH 的方法」列為首要的未來研究方向。

路徑二:壓縮過程的內化(Internalization of Compression Process)

對抗寄生的核心不在於讓模型記憶更多的壓縮結果,而在於讓模型學習壓縮的生成過程。Huang 等人(2024)的 Entropy Law 研究表明,訓練數據的多樣性(而非單一高質量樣本)更能最大化 LLM 的知識掌握度;相互信息冗餘的高質量數據集,其組合效益可能劣於多樣但局部質量較低的數據集。這在訓練數據設計上支持了「保留邏輯多樣性」的主張:以包含多重証明路徑的訓練語料,而非單一最優路徑的語料,訓練推理模型。

路徑三:語義熵預算的動態管理(Dynamic Semantic Entropy Budget)

結合 Farquhar 等人(2024)的語義熵幻覺檢測框架,可以在推理時對模型的輸出多樣性實施動態監控:當語義熵跌破閾值(即模型高度收斂於單一路徑),系統應觸發主動的路徑多樣化機制(如 beam search 的語義去重、Chain-of-Thought 的熵正則化)。Gan 等人(2025)的近期工作進一步表明,語義熵可以從隱藏狀態中高效估計,而無需多次採樣——這為實時的語義熵管理提供了計算上的可行性。

六、結論

《Compression is All You Need》的核心洞見,不僅是關於數學的——它揭示了可壓縮性作為智識活動之選擇壓力的普遍結構。人類數學在指數增長的形式空間中選擇了多項式增長的路徑,其代價是捨棄了 FM 中絕大多數的邏輯可能性;其報酬是獲得了可由有限符號承載的無限表達力。

資訊寄生描述的,是一種對這一報酬的無償提取——繞過選擇壓力,直接繼承壓縮紅利,卻不承載維持壓縮結構所需的多樣性成本。其後果是語義熵的不可逆耗散、推導多樣性的結構性喪失,以及面對邊界問題時推理鏈路的系統性崩潰。

解法不在於拒絕壓縮——壓縮本是智識進步的引擎——而在於在壓縮中保留超圖的可逆性與多路徑記憶。真正的 AI 推理進化,應體現為從有限 Token 中重建消失在 DAG 投影裡的邏輯超圖;不只繼承人類數學的「地圖」,更要重構繪製這張地圖時被捨棄的「地形多樣性」。

參考文獻

  1. Aksenov, V., Bodnia, E., Mulligan, M., & Freedman, M. (2026). Compression is all you need: Modeling Mathematics. arXiv:2603.20396. Microsoft Research / Harvard CMSA.
  2. Freedman, M. (2026, April). Compression Is All You Need: Modeling Mathematics [Invited talk]. Harvard Center of Mathematical Sciences and Applications (CMSA). https://cmsa.fas.harvard.edu/event/freedman_42426/
  3. Farquhar, S., Kossen, J., Kuhn, L., & Gal, Y. (2024). Detecting hallucinations in large language models using semantic entropy. Nature, 630(8017), 625–630. https://doi.org/10.1038/s41586-024-07421-0
  4. Huang, S., et al. (2024). Entropy Law: The Story Behind Data Compression and LLM Performance. arXiv:2407.06645.
  5. Oswald, J. T., & Rozek, B. (2023). Parallel Verification of Natural Deduction Proof Graphs. arXiv:2311.10440. Rensselaer AI & Reasoning Lab.
  6. Emergent Mind. (2026). Compression is All You Need: Modeling Mathematics — Analysis and Extensionshttps://www.emergentmind.com/papers/2603.20396
  7. Deng, Z., et al. (2025). Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy. arXiv:2508.03346.
  8. Wittgenstein, L. (1953). Philosophical Investigations. Blackwell. [中譯:《哲學研究》,商務印書館]
  9. Alves, S., Fernández, M., & Mackie, I. (2011). A new graphical calculus of proofs. arXiv:1102.2655.
  10. Frantar, E., et al. (2024). When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models. arXiv:2504.02010.

發表迴響

趨勢

探索更多來自 PaliPali 的內容

立即訂閱即可持續閱讀,還能取得所有封存文章。

Continue reading