Abstract
This paper explores the profound connection between the natural parameter η in statistics and the concept of suspension (epoché) in philosophy, arguing that both embody a form of “parametric thinking"—revealing the essential structure of phenomena through reparameterization or the temporary suspension of judgment. From exponential family distributions to phenomenological methods, from classical philosophy to machine learning, this mode of thinking demonstrates universality across time and space.
Logical Framework of the Article:
- Mathematical η: Reparameterization from intuition to structure
- Philosophical Suspension: Suspension of judgment from natural attitude to phenomenological essence
- Buddhist and Taoist Traditions: Classical interpretations of suspension and reparameterization
- Ethical Dense/Sparse and AI Model Sparse/Dense: Modern applications
- Universals and Particulars: The convergence point of metaphysics and machine learning
摘要
本文探討統計學中的自然參數 η 與哲學中懸置概念的深層連結,論證兩者都體現了一種「參數化思維」——通過重新參數化或暫時擱置判斷,揭示現象的本質結構。從指數族分布到現象學方法,從古典哲學到機器學習,這種思維模式展現了跨越時空的普遍性。
文章的邏輯架構就是:
- 數學 η:從直觀到結構的再參數化。
- 哲學懸置:從自然態度到現象本質的中止判斷。
- 佛教、道家:用經典語言詮釋懸置與再參數化。
- 倫理學濃密/稀疏、AI 模型稀疏/濃密:現代應用。
- 共相與殊相:形上學與機器學習的結合點。
1. Introduction: The Universality of Parametric Thinking
In seemingly unrelated domains of statistics and philosophy, there exists a common mode of thinking: revealing the essence of things by changing the “parameters" or “attitudes" of observation. Both the natural parameter η in statistics and the concept of suspension (epoché) in philosophy embody this profound insight of “parametric thinking."
Natural Parameter η and the Philosophical Implications of “Reparameterization"
In statistics, particularly in exponential families and Generalized Linear Models (GLM), the natural parameter (η, natural parameter) represents a form of reparameterization.
It transforms “intuitive parameters" such as mean μ and probability π into core variables that can directly appear in the exponential term of the log-likelihood function. The advantage of this transformation is mathematical elegance, making the derivation of dual functions and sufficient statistics more natural (Wainwright & Jordan, 2008).
However, this “reparameterization" carries philosophical metaphor: our intuitive experience of the world (μ, π) often requires a form of “reconstruction" or “transformation" to become theoretically operable objects (η). This resembles what Husserl described in Ideas I (1913): we must temporarily “suspend" the natural attitude, transforming intuitive experience into phenomenological “essential intuition."
Mathematical η = Pure structure after phenomenological suspension operation.
2. Natural Parameter η: Statistical Essential Parameterization
2.1 Definition and Properties of Natural Parameters
η (pronounced “eta") is the natural parameter or canonical parameter in exponential family distributions. In the standard form of exponential family distributions:
f(x|θ) = h(x)exp{η(θ)·T(x) - A(η(θ))}
where η(θ) is the natural parameter, representing a reparameterization of the original parameter θ.
2.2 Core Characteristics of Natural Parameters
Essence of Reparameterization: η is not a newly created parameter from nothing, but rather a mathematical transformation of existing “intuitive parameters" (such as probability π, mean μ). This transformation reveals the internal structure of the distribution.
Mathematical Elegance: Under natural parameterization, exponential family distributions exhibit a unified form, the log-likelihood function becomes linear, and maximum likelihood estimation possesses favorable statistical properties.
Computational Efficiency: In machine learning, natural parameters make gradient computation and parameter updates more concise, which is particularly important in backpropagation algorithms.
3. Suspension: Philosophical Methodological Parameterization
3.1 Classical Concept of Suspension
Pyrrhonian epoché: Ancient Greek skeptics proposed “suspension of judgment"—temporarily suspending judgment when faced with uncertain matters to achieve mental tranquility (ataraxia).
Husserlian Phenomenological Suspension: Temporarily “bracketing" the presuppositions about worldly existence in the natural attitude, in order to purely describe how phenomena appear. Husserl reinterpreted this as a method of suspending judgment. It does not negate the world, but temporarily sets aside everyday beliefs to gain distance, thereby observing “how phenomena give themselves" (Husserl, 1931).
3.2 Operational Characteristics of Suspension
Methodological Tool: Suspension is not doubting the existence of the world, but changing the “parameters" of observation—from presupposed existence to pure description.
Revealing Essence: Through suspending everyday judgments, the essential structure of things can be revealed, as Husserl called “essential intuition."
This methodological structure has inherent similarity to “natural parameters" in mathematical statistics: both transform original intuition (μ, π / natural attitude) into more fundamental structural expressions (η / essence of phenomena).
4. Suspension Thinking in Eastern Wisdom
4.1 The Triadic Logic of the Diamond Sutra
The Diamond Sutra repeatedly uses the “A-non-A-named-A" structure embodying suspension thinking:
- “All particles, the Tathagata says are not particles, therefore they are called particles"
- “The world, non-world, is called world"
This expression pattern actually consists of:
- Affirmation: Acknowledging the appearance of phenomena (particles, world)
- Negation: Suspending their absolute reality (non-particles, non-world)
- Reparameterization: Renaming within a new framework of understanding (called particles, called world)
Buddhist “Suspension": The Diamond Sutra’s View of Non-characteristics:
Setting aside our habitual conceptualizations allows insight into all dharmas as dream-like illusions. This is extremely similar to mathematics’ η “removing intuitive interference, returning to pure structure."
The Buddhist negation of “no self-characteristics, person-characteristics, sentient being-characteristics, life-span characteristics" is precisely a form of reparameterization: it allows us not to cling to intuitive μ, but to return to structural η, thereby transcending limited intuitive frameworks.
4.2 Dynamic Suspension in the Tao Te Ching
“Returning is the movement of the Tao, weakness is the function of the Tao" demonstrates the duality of suspension:
- Returning (movement): Like backpropagation algorithms, adjusting parameters through reverse processes
- Weakness (function): Like the simplicity of natural parameters, carrying maximum information in minimal form
Tao Te Ching and Backpropagation:
The Tao Te Ching’s statement “returning is the movement of the Tao" can be viewed as philosophical Backpropagation: the principles of the world do not lie in unidirectional linearity, but require repeated gradient corrections. This cyclical thinking is remarkably similar to error backpropagation in deep learning (Rumelhart et al., 1986).
“Weakness is the function of the Tao" corresponds to η’s mathematical role: η is not a superficial intuitive quantity, but a subtle yet pervasive internal core throughout the entire statistical structure.
5. Williams’ Dense/Sparse Concepts: Layered Structure of Suspension
5.1 Parametric Levels of Concepts
Sparse Concepts: Such as “good" and “evil," resembling placeholder symbols that require further parameterization to gain substantial meaning.
Dense Concepts: Such as “fraud" and “courage," already containing rich descriptive and normative content.
5.2 Layered Application of Suspension
This distinction shows that suspension can operate at different levels:
- Suspension of sparse concepts: Reveals their emptiness, requiring further filling
- Suspension of dense concepts: Strips away cultural wrappings to explore core structure
6. Unified Principles of Parametric Thinking
6.1 The Nature of Transformation
Whether statistical η transformation or philosophical suspension, both embody the same mode of thinking:
- From Intuition to Essence: Removing surface complexity to reveal deep structure
- Reparameterization: Re-expressing phenomena in new coordinate systems
- Principle of Parsimony: Capturing maximum information with minimum parameters
6.2 Cognitive Science Perspective
Modern cognitive science confirms that brain information processing follows similar principles:
- Sparse Coding: Neural networks tend to encode information using the fewest active neurons
- Representation Learning: Deep learning models learn essential features of data through hierarchical transformations
6.3 Suspension in Dense and Sparse Concepts
Williams (1985) distinguishes between “sparse" and “dense" ethical concepts.
- Sparse concepts (such as good, evil) are almost placeholder symbols, requiring preserved interpretive space, which is isomorphic with “suspension."
- Dense concepts (such as fraud, murder) have more content, but are still inevitably shaped by culture and context.
In data science, this precisely corresponds to the dialectic of Sparse Models and Dense Models. Sparsification techniques (such as L1 regularization) require us to “suspend" redundant parameters, preserving only the structural core, which is itself a form of “mathematical suspension."
7. Conclusion: Toward an Integrated Parametric Philosophy
The dialogue between natural parameter η and the concept of suspension reveals a fundamental characteristic of human cognition: by changing the “parameters" or “attitudes" of observation, we can penetrate the surface of phenomena and touch their essential structure. This parametric thinking is not only the core of scientific method but also the foundation of philosophical reflection.
In the age of artificial intelligence, understanding the universality of this mode of thinking helps us better design learning algorithms and more profoundly understand the essence of human intelligence. As the Tao Te Ching states: “Weakness is the function of the Tao"—the most concise parameterization often contains the deepest wisdom.
Substrate Theory and the Position of η
Descartes and Kant both asked: What is the basic metaphysics of describing the world? Like a “container" receiving the forms of things. Kant later transformed this basic, receptive spatial dimension from a reproduced “characteristic" into the form our sensory functions give to things when experiencing them. The natural parameter η is precisely such a fundamental mathematical characterization of substrate.
It is not the “thisness" (haecceitas) of concrete μ, but a formalized core structure that explains how instances can be abstractly captured.
In other words: η is the bridge connecting “particulars" and “universals"
Universals and Particulars
In medieval philosophy, the debate between universals and particulars was the central problem of metaphysics.
In modern deep learning’s Graph Neural Networks, the structure of edges and nodes is precisely a reproduction of the “universal-particular" relationship: universals are shared embedding spaces, particulars are individual node features.
The natural parameter η, when deriving Activation Functions and Loss Functions, plays the role of “connecting universals (distributional family structure) with particulars (sample observations)." This is why η is so crucial in statistics and machine learning.
自然參數 η 與懸置:從統計學到哲學的參數化思維
摘要
本文探討統計學中的自然參數 η 與哲學中懸置概念的深層連結,論證兩者都體現了一種「參數化思維」——通過重新參數化或暫時擱置判斷,揭示現象的本質結構。從指數族分布到現象學方法,從古典哲學到機器學習,這種思維模式展現了跨越時空的普遍性。
文章的邏輯架構就是:
- 數學 η:從直觀到結構的再參數化。
- 哲學懸置:從自然態度到現象本質的中止判斷。
- 佛教、道家:用經典語言詮釋懸置與再參數化。
- 倫理學濃密/稀疏、AI 模型稀疏/濃密:現代應用。
- 共相與殊相:形上學與機器學習的結合點。
1. 引言:參數化思維的普遍性
在看似無關的統計學與哲學領域中,存在著一個共同的思維模式:通過改變觀察的「參數」或「態度」來揭示事物的本質。統計學中的自然參數 η 和哲學中的懸置(epoché)概念,都體現了這種「參數化思維」的深刻洞察。
自然參數 η 與「再參數化」的哲學意涵
在統計學,特別是指數族分布(Exponential Family)與廣義線性模型(Generalized Linear Model, GLM)中,自然參數(η, natural parameter)是一種再參數化(reparameterization)。
它將「直觀參數」如平均數 μ、機率 π,轉換成能夠直接出現在對數似然函數的指數項中的核心變數。這種轉換的好處是數學上更為優雅,推導出對偶函數與充分統計量也更自然(Wainwright & Jordan, 2008)。
但這種「再參數化」其實具有哲學隱喻:我們對世界的直觀經驗(μ, π)往往需要經過某種「重構」或「轉化」,才能成為理論可操作的對象(η)。這就像胡塞爾(Husserl)在《觀念學》(Ideen I, 1913)中所說:我們必須暫時「懸置」自然態度,將直觀經驗轉化為現象學的「本質直觀」。
數理中的 η = 現象學中的懸置操作後的純粹結構。
2. 自然參數 η:統計學的本質參數化
2.1 自然參數的定義與特性
η(讀作"eta")是指數族分布中的自然參數(natural parameter)或標準參數(canonical parameter)。在指數族分布的標準形式中:
f(x|θ) = h(x)exp{η(θ)·T(x) - A(η(θ))}
其中 η(θ) 就是自然參數,它是原始參數 θ 的重新參數化。
2.2 自然參數的核心特徵
重新參數化的本質:η 不是憑空創造的新參數,而是對原有「直觀參數」(如機率 π、平均數 μ)的數學變換。這種變換揭示了分布的內在結構。
數學優雅性:在自然參數下,指數族分布展現出統一的形式,對數似然函數變成線性,最大似然估計具有良好的統計性質。
計算效率:在機器學習中,自然參數使得梯度計算和參數更新變得簡潔,這在反向傳播算法中尤為重要。
3. 懸置:哲學的方法論參數化
3.1 古典懸置概念
皮浪學派的 epoché:古希臘懷疑主義者提出的「判斷中止」,面對無法確定的事物時暫停判斷,以達到心靈的寧靜(ataraxia)。
胡塞爾的現象學懸置:暫時「括號化」自然態度中對世界存在的預設,以便純粹地描述現象的顯現方式。胡塞爾重新詮釋為一種中止判斷的方式。它不是否定世界,而是把日常信念暫時擱置,以獲得距離,從而觀察「現象如何自身給予」(Husserl, 1931)
3.2 懸置的操作特徵
方法論工具:懸置不是懷疑世界的存在,而是改變觀察的「參數」——從預設存在轉向純粹描述。
揭示本質:通過暫停日常判斷,讓事物的本質結構得以顯現,正如胡塞爾所說的「本質直觀」。
這種方法的結構和數理統計中的「自然參數」有內在相似性:兩者都將原始直觀(μ, π / 自然態度)轉換為更基礎的結構性表達(η / 現象之本質)。
4. 東方智慧中的懸置思維
4.1 《金剛經》的三段式邏輯
金剛經反覆使用的「A非A是名A」結構體現了懸置思維:
- “諸微塵,如來說非微塵,是名微塵"
- “世界,非世界,是名世界"
這種表述方式實際上是:
- 肯定:承認現象的顯現(微塵、世界)
- 否定:懸置其絕對實在性(非微塵、非世界)
- 重新參數化:在新的理解框架下重新命名(是名微塵、是名世界)
佛教中的「懸置」:金剛經的無相觀:
將我們慣常的名相放下,才能洞察一切法如夢幻泡影。這與數學上 η 的「去除直觀干擾、回歸純粹結構」極為相似。
佛法中的「無我相、人相、眾生相、壽者相」之否定,正是一種再參數化:它讓我們不執著於直觀的 μ,而是回到結構性的 η,從而得以超越有限的直觀框架。
4.2 《道德經》的動態懸置
“返也者道之動,弱也者道之用"展現了懸置的雙重性:
- 返(動):如反向傳播算法,通過逆向過程調整參數
- 弱(用):如自然參數的簡潔性,以最小形式承載最大信息
道德經與反向傳播:
《道德經》所說「反者,道之動」,可視為一種哲學上的 Backpropagation:世界的道理不在於單向線性,而是需要經歷反覆的梯度修正。這種循環的思想與深度學習的誤差反傳(Backpropagation)異曲同工(Rumelhart et al., 1986)。
「弱者,道之用」則可對應到 η 的數學角色:η 並不是表面的直觀量,而是一個隱微卻貫穿整個統計結構的內在核心。
5. 威廉斯的濃密/稀疏概念:懸置的分層結構
5.1 概念的參數化層次
稀疏概念:如「善」、「惡」等,類似於佔位符號,需要進一步的參數化才能獲得實質意義。
濃密概念:如「詐欺」、「勇敢」等,已經包含了豐富的描述性和規範性內容。
5.2 懸置的分層應用
這種區分顯示懸置可以在不同層次上操作:
- 對稀疏概念的懸置:揭示其空洞性,需要進一步填充
- 對濃密概念的懸置:剝離其文化包裹,探索核心結構
6. 參數化思維的統一原理
6.1 變換的本質
無論是統計學的 η 變換還是哲學的懸置,都體現了同一種思維模式:
- 從直觀到本質:去除表面的複雜性,揭示深層的結構
- 重新參數化:在新的座標系統中重新表達現象
- 簡潔性原則:以最少的參數捕捉最大的信息
6.2 認知科學的視角
現代認知科學證實,大腦的信息處理也遵循類似原則:
- 稀疏編碼:神經網絡傾向於用最少的活躍神經元編碼信息
- 表徵學習:深度學習模型通過層次化變換學習數據的本質特徵
6.3 濃密與稀疏概念中的懸置
Williams (1985) 區分「稀疏」與「濃密」的倫理學概念。
- 稀疏概念(如善、惡)幾乎是佔位符號,需保留解釋空間,這與「懸置」同構。
- 濃密概念(如詐欺、謀殺)則更有內涵,但仍不免被文化和情境塑造。
在數據科學中,這恰好對應到 稀疏模型(Sparse Models)與濃密模型(Dense Models) 的辯證。稀疏化技術(如 L1 正則化)要求我們「懸置」多餘的參數,只保留結構核心,這本身就是一種「數理化的懸置」。
7. 結論:走向整合的參數化哲學
自然參數 η 與懸置概念的對話,揭示了人類認知的一個基本特徵:通過改變觀察的「參數」或「態度」,我們能夠穿透現象的表面,觸及其本質結構。這種參數化思維不僅是科學方法的核心,也是哲學反思的根本。
在人工智慧時代,理解這種思維模式的普遍性,有助於我們更好地設計學習算法,也更深刻地理解人類智慧的本質。正如《道德經》所言:"弱也者道之用也",最簡潔的參數化往往包含最深刻的智慧。
基底理論與 η 的位置
笛卡兒與康德都在問:世界的描述的基本形上學是什麼?,如「容器」接收到事物的理型。康德後來會把這個基本的、接收性的空間維度,重現的一個”特徵”,改變成我們感官功能在經驗到事物時,給予它們的形式,自然參數 η 正是如此形式基本的一種基底的數理刻畫。它不是具象 μ 的「此性」(haecceity),而是一個形式化的核心結構,說明了個例如何得以被抽象捕捉。
換言之:η 是連接「殊相」與「共相」的橋樑
共相與殊相在中世紀哲學中,共相與殊相的辯論(universals vs particulars)是形上學的核心問題。
現代深度學習的圖模型(Graph Neural Networks)中,edge 與 node 的結構正是一種「共相–殊相」的再現:共相是共享的 embedding 空間,殊相是個別節點的特徵。
而自然參數 η 在推導 Activation Function 與 Loss Function 時,正扮演著「把共相(分布族結構)與殊相(樣本觀察)接合」的角色。這也是為什麼 η 在統計與機器學習裡如此關鍵。
參考文獻
統計學與機器學習:
- Brown, L. D. (1986). Fundamentals of Statistical Exponential Families. Institute of Mathematical Statistics.
- McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models. Chapman and Hall.
- Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1-2), 1-305.
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
- Husserl, E. (1913). Ideen zu einer reinen Phänomenologie. Halle: Niemeyer.
- Husserl, E. (1931). Ideas: General Introduction to Pure Phenomenology (Trans. W. R. Boyce Gibson). London: Allen & Unwin.
- Williams, B. (1985). Ethics and the Limits of Philosophy. Harvard University Press.
- Descartes, R. (1641). Meditations on First Philosophy.
- Kant, I. (1781/1787). Critique of Pure Reason.
哲學典籍: 5. Sextus Empiricus. Outlines of Pyrrhonism. (Trans. R. G. Bury). Harvard University Press. 6. Husserl, E. (1913). Ideas: General Introduction to Pure Phenomenology. (Trans. W. R. Boyce Gibson). Macmillan. 7. 《金剛般若波羅蜜經》(鳩摩羅什譯) 8. 《道德經》(老子)
當代哲學: 9. Williams, B. (1985). Ethics and the Limits of Philosophy. Harvard University Press. 10. Dummett, M. (1973). Frege: Philosophy of Language. Duckworth.
認知科學: 11. Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607-609. 12. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.






發表留言