LogSumExp

LogSumExp（LSE，也称RealSoftMax^[1]或多变量softplus）函数是一个平滑最大值——一个对极值函数的光滑近似，主要用在机器学习算法中。^[2] 其定义为参数的指数的和的对数：

\mathrm {LSE} (x_{1},\dots ,x_{n})=\log \left(\exp(x_{1})+\cdots +\exp(x_{n})\right).

性质

LogSumExp函数的定义域为 $\mathbb {R} ^{n}$ （实数空间（英语：real coordinate space）），共域是 $\mathbb {R}$ （实数线）。它是对极值函数 $\max _{i}x_{i}$ 的近似，同时有如下的界限：

\max {\{x_{1},\dots ,x_{n}\}}\leq \mathrm {LSE} (x_{1},\dots ,x_{n})\leq \max {\{x_{1},\dots ,x_{n}\}}+\log(n).

第一个不等式在 $n=1$ 以外的情况是严格成立的，第二个不等式仅在所有元素相等时取等号。（证明：令 $m=\max _{i}x_{i}$ ，则 $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ 。将不等式取对数即可。）

另外，我们可以将不等式缩放到更紧的界限。考虑函数 ${\frac {1}{t}}\mathrm {LSE} (tx)$ 。然后，

\max {\{x_{1},\dots ,x_{n}\}}<{\frac {1}{t}}\mathrm {LSE} (tx)\leq \max {\{x_{1},\dots ,x_{n}\}}+{\frac {\log(n)}{t}}

（证明：将上式 $x_{i}$ 用 $t>0$ 的 $tx_{i}$ 替换，得到

\max {\{tx_{1},\dots ,tx_{n}\}}<\mathrm {LSE} (tx_{1},\dots ,tx_{n})\leq \max {\{tx_{1},\dots ,tx_{n}\}}+\log(n)

由于 $t>0$ ，

t\max {\{x_{1},\dots ,x_{n}\}}<\mathrm {LSE} (tx_{1},\dots ,tx_{n})\leq t\max {\{x_{1},\dots ,x_{n}\}}+\log(n)

最后，同除 $t$ 得到结果。）

此外，如果我们乘上一个负数，可以得到一个与 $\min$ 有关的不等式：

\min {\{x_{1},\dots ,x_{n}\}}-{\frac {\log(n)}{t}}\leq {\frac {1}{-t}}\mathrm {LSE} (-tx)<\min {\{x_{1},\dots ,x_{n}\}}.

LogSumExp函数是凸函数，因此在定义域上严格递增。^[3] （但并非处处都是严格凸的^[4]。）

令 $\mathbf {x} =(x_{1},\dots ,x_{n})$ ，偏导数为：

{\frac {\partial }{\partial x_{i}}}{\mathrm {LSE} (\mathbf {x} )}={\frac {\exp x_{i}}{\sum _{j}\exp {x_{j}}}},

表明LogSumExp的梯度是softmax函数。

LogSumExp的凸共轭是负熵（英语：negative entropy）。

对数域中的log-sum-exp计算技巧

当通常的算术计算在对数尺度上进行时，经常会遇到LSE函数，例如对数概率。^[5]

类似于线性尺度中的乘法运算变成对数尺度中的简单加法，线性尺度中的加法运算变成对数尺度中的LSE：

\mathrm {LSE} (\log(x_{1}),...,\log(x_{n}))=\log(x_{1}+\dots +x_{n})

使用对数域计算的一个常见目的是在使用有限精度浮点数直接表示（在线性域中）非常小或非常大的数字时提高精度并避免溢出问题.^[6]

不幸的是，在一些情况下直接使用 LSE 依然会导致上溢/下溢问题，必须改用以下等效公式（尤其是当上述“最大”近似值的准确性不够时）。因此，IT++等很多数学库都提供了LSE的默认例程，并在内部使用了这个公式。

\mathrm {LSE} (x_{1},\dots ,x_{n})=x^{*}+\log \left(\exp(x_{1}-x^{*})+\cdots +\exp(x_{n}-x^{*})\right)

其中 $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$

一个严格凸的log-sum-exp型函数

LSE是凸的，但不是严格凸的。我们可以通过增加一项为零的额外参数来定义一个严格凸的log-sum-exp型函数^[7]：

\mathrm {LSE} _{0}^{+}(x_{1},...,x_{n})=\mathrm {LSE} (0,x_{1},...,x_{n})

This function is a proper Bregman generator (strictly convex and differentiable). It is encountered in machine learning, for example, as the cumulant of the multinomial/binomial family.

在热带分析（英语：tropical analysis）中，这是对数半环（英语：log semiring）的和。

参见

参考资料

^ Zhang, Aston; Lipton, Zack; Li, Mu; Smola, Alex. Dive into Deep Learning, Chapter 3 Exercises. www.d2l.ai. [27 June 2020]. （原始内容存档于2022-03-31）.
^ Nielsen, Frank; Sun, Ke. Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities. Entropy. 2016, 18 (12): 442. Bibcode:2016Entrp..18..442N. S2CID 17259055. arXiv:1606.05850 . doi:10.3390/e18120442 .
^ El Ghaoui, Laurent. Optimization Models and Applications. 2017 [2022-10-16]. （原始内容存档于2020-12-19）.
^ convex analysis - About the strictly convexity of log-sum-exp function - Mathematics Stack Exchange. stackexchange.com.
^ McElreath, Richard. Statistical Rethinking. OCLC 1107423386.
^ Practical issues: Numeric stability.. CS231n Convolutional Neural Networks for Visual Recognition. [2022-10-16]. （原始内容存档于2022-12-06）.
^ Nielsen, Frank; Hadjeres, Gaetan. Monte Carlo Information Geometry: The dually flat case. 2018. Bibcode:2018arXiv180307225N. arXiv:1803.07225 .