空间可变高斯混合模型(拉普拉斯正则化 平滑)
以下推导,理解为在若干张图像中某一像素点的高斯混合模型推导过程。
空间可变高斯混合模型,基础理论认为所有采样点都来自K个不同的分布,但是每一个点上的权重不一样;现阶段采用的高斯混合模型,认为每一个点都有不同的分布,每个分布对应不同的权重。
混合分布是多个概率密度函数的线性组合,形式如下:
p ( x ∣ Θ ) = ∑ k = 1 M α k p k ( x ∣ θ k ) , and ∑ k = 1 M α k = 1 ( 1.1 ) p(x \mid \Theta) = \sum_{k=1}^M \alpha_k \, p_k(x \mid \theta_k), \; \text{and} \sum_{k=1}^M \alpha_k = 1 \qquad (1.1)
p ( x ∣ Θ ) = k = 1 ∑ M α k p k ( x ∣ θ k ) , and k = 1 ∑ M α k = 1 ( 1.1 )
对于 Θ = ( α 1 , … , α M , θ 1 , … , θ k ) \Theta = (\alpha_1, \ldots, \alpha_M, \theta_1, \ldots, \theta_k) Θ = ( α 1 , … , α M , θ 1 , … , θ k ) ,表示该混合分布由 M M M 个分布构成,每个分布的权重为 α k \alpha_k α k 。当每个分布均服从高斯分布时,则称该混合分布为 M M M 个分布的高斯混合模型。
假设样本观测值为 X = { x i } , 1 ≤ i ≤ N X = \{ x_i \}, 1 \le i \le N X = { x i } , 1 ≤ i ≤ N ,则由上式可知,高斯混合模型中每个高斯分布的概率密度函数 p k ( x ∣ θ k ) p_k(x \mid \theta_k) p k ( x ∣ θ k ) 为:
p k ( x ∣ θ k ) = 1 2 π σ k 2 e − ( x i − μ k ) 2 2 σ k 2 ( 1.2 ) p_k(x \mid \theta_k) = \frac{1}{\sqrt{2\pi \sigma_k^2}} \, e^{- \frac{(x_i - \mu_k)^2}{2\sigma_k^2}} \qquad (1.2)
p k ( x ∣ θ k ) = 2 π σ k 2 1 e − 2 σ k 2 ( x i − μ k ) 2 ( 1.2 )
对于 N N N 个独立同分布的样本观察值,它们的联合分布有:
p ( X ∣ Θ ) = ∏ i = 1 N p ( x i ∣ Θ ) ( 1.3 ) p(X \mid \Theta) = \prod_{i=1}^{N} p(x_i \mid \Theta) \qquad (1.3)
p ( X ∣ Θ ) = i = 1 ∏ N p ( x i ∣ Θ ) ( 1.3 )
它们的对数似然函数可以写成:
ln L ( Θ ∣ X ) = ln ∏ i = 1 N p ( x i ∣ Θ ) = ∑ i = 1 N ln [ ∑ k = 1 M α k p k ( x i ∣ θ k ) ] ( 1.4 ) \ln L(\Theta \mid X) = \ln \prod_{i=1}^{N} p(x_i \mid \Theta)
= \sum_{i=1}^{N} \ln \left[ \sum_{k=1}^{M} \alpha_k \, p_k(x_i \mid \theta_k) \right] \qquad (1.4)
ln L ( Θ ∣ X ) = ln i = 1 ∏ N p ( x i ∣ Θ ) = i = 1 ∑ N ln [ k = 1 ∑ M α k p k ( x i ∣ θ k ) ] ( 1.4 )
我们的目的是求 Q = arg max Θ L ( Θ ∣ X ) Q = \arg\max_{\Theta} L(\Theta \mid X) Q = arg max Θ L ( Θ ∣ X ) 的极大值,从数学的基础理论上来分析,可以对上述对数似然函数进行求导,从而寻求极值。但是这是比较困难的任务,因为它含有对数函数和多项式求和。为了克服这个困难,引入隐变量 Y = { y i } , 1 ≤ i ≤ N Y = \{ y_i \}, 1 \le i \le N Y = { y i } , 1 ≤ i ≤ N ,并且对于每个隐变量 y i ∈ { 1 , 2 , … , M } y_i \in \{1, 2, \ldots, M\} y i ∈ { 1 , 2 , … , M } ,当 y i = k y_i = k y i = k 时,表示第 i i i 个样本观测值 x i x_i x i 是高斯混合分布的第 k k k 个分布产生的。因此,引入隐变量 Y Y Y 后,对数似然函数可以改写成为
ln L ( Θ ∣ X , Y ) = ln ∏ i = 1 N p ( x i , y i ∣ Θ ) = ∑ i = 1 N ln [ α y i p y i ( x i ∣ θ y i ) ] ( 1.5 ) \ln L(\Theta \mid X, Y)
= \ln \prod_{i=1}^{N} p(x_i, y_i \mid \Theta)
= \sum_{i=1}^{N} \ln \left[ \alpha_{y_i} \, p_{y_i}(x_i \mid \theta_{y_i}) \right] \qquad (1.5)
ln L ( Θ ∣ X , Y ) = ln i = 1 ∏ N p ( x i , y i ∣ Θ ) = i = 1 ∑ N ln [ α y i p y i ( x i ∣ θ y i ) ] ( 1.5 )
改写似然函数后,可以采用 EM 算法来对模型进行参数估计。
假设在第 t − 1 t-1 t − 1 次迭代开始,有 Θ \Theta Θ 的估计 Θ ^ t − 1 = ( α 1 t − 1 , … , α M t − 1 , θ 1 t − 1 , … , θ M t − 1 ) \hat{\Theta}^{t-1} = (\alpha_1^{t-1}, \ldots, \alpha_M^{t-1}, \theta_1^{t-1}, \ldots, \theta_M^{t-1}) Θ ^ t − 1 = ( α 1 t − 1 , … , α M t − 1 , θ 1 t − 1 , … , θ M t − 1 ) 。
在 EM 算法的 E 步,求完全数据的对数似然函数的期望,如下:
Q ( Θ ∣ Θ t − 1 ) = E [ ln L ( Θ ∣ X , Y ) ] = ∑ y ln [ L ( Θ ∣ X , y ) ] p ( y ∣ X , Θ t − 1 ) = ∑ y ∑ i = 1 N ln [ α y i p y i ( x i ∣ θ y i ) ] p ( y ∣ x i , Θ t − 1 ) ( 1.6 ) \begin{aligned}
Q(\Theta \mid \Theta^{t-1})
&= \mathbb{E} \left[ \ln L(\Theta \mid X, Y) \right] \\
&= \sum_{y} \ln \left[ L(\Theta \mid X, y) \right] \, p(y \mid X, \Theta^{t-1}) \\
&= \sum_{y} \sum_{i=1}^{N} \ln \left[ \alpha_{y_i} \, p_{y_i}(x_i \mid \theta_{y_i}) \right] \, p(y \mid x_i, \Theta^{t-1})
\end{aligned} \qquad (1.6)
Q ( Θ ∣ Θ t − 1 ) = E [ ln L ( Θ ∣ X , Y ) ] = y ∑ ln [ L ( Θ ∣ X , y ) ] p ( y ∣ X , Θ t − 1 ) = y ∑ i = 1 ∑ N ln [ α y i p y i ( x i ∣ θ y i ) ] p ( y ∣ x i , Θ t − 1 ) ( 1.6 )
已知第 i i i 个观测值 x i x_i x i 来自第 k k k 个分布的概率为 p ( y i = k ∣ x i , Θ t − 1 ) p(y_i = k \mid x_i, \Theta^{t-1}) p ( y i = k ∣ x i , Θ t − 1 ) 。因此上式可以写成:
Q ( Θ ∣ Θ t − 1 ) = ∑ k = 1 M ∑ i = 1 N ln [ α k p k ( x i ∣ θ k ) ] p ( k ∣ x i , Θ t − 1 ) = ∑ k = 1 M ∑ i = 1 N ln ( α k ) p ( k ∣ x i , Θ t − 1 ) + ∑ k = 1 M ∑ i = 1 N ln [ p k ( x i ∣ θ k ) ] p ( k ∣ x i , Θ t − 1 ) ( 1.7 ) \begin{aligned}
Q(\Theta \mid \Theta^{t-1})
&= \sum_{k=1}^{M} \sum_{i=1}^{N} \ln \left[ \alpha_k \, p_k(x_i \mid \theta_k) \right] \, p(k \mid x_i, \Theta^{t-1}) \\
&= \sum_{k=1}^{M} \sum_{i=1}^{N} \ln (\alpha_k) \, p(k \mid x_i, \Theta^{t-1}) + \sum_{k=1}^{M} \sum_{i=1}^{N} \ln \left[ p_k(x_i \mid \theta_k) \right] \, p(k \mid x_i, \Theta^{t-1})
\end{aligned} \qquad (1.7)
Q ( Θ ∣ Θ t − 1 ) = k = 1 ∑ M i = 1 ∑ N ln [ α k p k ( x i ∣ θ k ) ] p ( k ∣ x i , Θ t − 1 ) = k = 1 ∑ M i = 1 ∑ N ln ( α k ) p ( k ∣ x i , Θ t − 1 ) + k = 1 ∑ M i = 1 ∑ N ln [ p k ( x i ∣ θ k ) ] p ( k ∣ x i , Θ t − 1 ) ( 1.7 )
而由贝叶斯公式可知
p ( k ∣ x i , Θ t − 1 ) = p ( k , x i ∣ θ k t − 1 ) ∑ s = 1 M p ( s , x i ∣ θ s t − 1 ) = p k ( x i ∣ θ k t − 1 ) ∑ s = 1 M p s ( x i ∣ θ s t − 1 ) ( 1.8 ) p(k \mid x_i, \Theta^{t-1}) =
\frac{p(k, x_i \mid \theta_k^{t-1})}
{\sum\limits_{s=1}^{M} p(s, x_i \mid \theta_s^{t-1})} =
\frac{p_k(x_i \mid \theta_k^{t-1})}
{\sum\limits_{s=1}^{M} p_s(x_i \mid \theta_s^{t-1})} \qquad (1.8)
p ( k ∣ x i , Θ t − 1 ) = s = 1 ∑ M p ( s , x i ∣ θ s t − 1 ) p ( k , x i ∣ θ k t − 1 ) = s = 1 ∑ M p s ( x i ∣ θ s t − 1 ) p k ( x i ∣ θ k t − 1 ) ( 1.8 )
其中,p k ( x i ∣ θ k ) p_k(x_i \mid \theta_k) p k ( x i ∣ θ k ) 由式 (1.2) 给出。
在对均值、方差和权重都加入空间正则化后,Q Q Q 函数可以写成:
Q ( Θ ∣ Θ t − 1 ) = ∑ k = 1 M ∑ i = 1 N ln ( α k ) p ( k ∣ x i , Θ t − 1 ) + ∑ k = 1 M ∑ i = 1 N ln [ p k ( x i ∣ θ k ) ] p ( k ∣ x i , Θ t − 1 ) − λ μ ∑ k = 1 M ∑ N ( i ) ( μ k ( i ) − μ k ( j ) ) 2 − λ σ ∑ k = 1 M ∑ N ( i ) ( σ k 2 ( i ) − σ k 2 ( j ) ) 2 − λ α ∑ k = 1 M ∑ N ( i ) ( ln α k ( i ) − ln α k ( j ) ) 2 ( 1.9 ) \begin{align}
Q(\Theta \mid \Theta^{t-1})
&= \sum_{k=1}^{M} \sum_{i=1}^{N} \ln (\alpha_k) \, p(k \mid x_i, \Theta^{t-1}) + \sum_{k=1}^{M} \sum_{i=1}^{N} \ln \left[ p_k(x_i \mid \theta_k) \right] \, p(k \mid x_i, \Theta^{t-1}) \\
&- \lambda_\mu \sum_{k=1}^{M} \sum_{\mathcal{N}(i)} (\mu_k(i) - \mu_k(j))^2 - \lambda_\sigma \sum_{k=1}^{M} \sum_{\mathcal{N}(i)} (\sigma_k^2(i) - \sigma_k^2(j))^2 - \lambda_\alpha \sum_{k=1}^{M} \sum_{\mathcal{N}(i)} (\ln \alpha_k(i) - \ln \alpha_k(j))^2
\end{align} \qquad (1.9)
Q ( Θ ∣ Θ t − 1 ) = k = 1 ∑ M i = 1 ∑ N ln ( α k ) p ( k ∣ x i , Θ t − 1 ) + k = 1 ∑ M i = 1 ∑ N ln [ p k ( x i ∣ θ k ) ] p ( k ∣ x i , Θ t − 1 ) − λ μ k = 1 ∑ M N ( i ) ∑ ( μ k ( i ) − μ k ( j ) ) 2 − λ σ k = 1 ∑ M N ( i ) ∑ ( σ k 2 ( i ) − σ k 2 ( j ) ) 2 − λ α k = 1 ∑ M N ( i ) ∑ ( ln α k ( i ) − ln α k ( j ) ) 2 ( 1.9 )
其中,λ μ , λ σ , λ α \lambda_\mu, \lambda_\sigma, \lambda_\alpha λ μ , λ σ , λ α 分别为均值、方差和权重的正则化系数,N ( i ) \mathcal{N}(i) N ( i ) 表示像素 j j j 是像素 i i i 的邻域。
首先,对均值 μ k ( i ) \mu_k(i) μ k ( i ) 求偏导,可以将 Q ( Θ ∣ Θ t − 1 ) Q(\Theta \mid \Theta^{t-1}) Q ( Θ ∣ Θ t − 1 ) 对 μ k ( i ) \mu_k(i) μ k ( i ) 求偏导,并令其为零,有:
∂ Q ( Θ ∣ Θ t − 1 ) ∂ μ k ( i ) = ∂ ∂ μ k ( i ) [ ∑ i = 1 N ln [ p k ( x i ∣ θ k ) ] p ( k ∣ x i , Θ t − 1 ) − λ μ ∑ N ( i ) ( μ k ( i ) − μ k ( j ) ) 2 ] = ∂ ∂ μ k ( i ) [ ∑ i = 1 N − ( x i − μ k ( i ) ) 2 2 σ k 2 p ( k ∣ x i , Θ t − 1 ) ] − ∂ ∂ μ k ( i ) [ λ μ ∑ N ( i ) ( μ k ( i ) − μ k ( j ) ) 2 ] = 1 σ k 2 ∑ i = 1 N ( x i − μ k ( i ) ) p ( k ∣ x i , Θ t − 1 ) − 2 λ μ ∑ j ∈ N ( i ) ( μ k ( i ) − μ k ( j ) ) = 0 \begin{aligned}
\frac{\partial Q(\Theta \mid \Theta^{t-1})}{\partial \mu_k(i)}
&= \frac{\partial}{\partial \mu_k(i)} \Bigg[ \sum_{i=1}^{N} \ln \left[ p_k(x_i \mid \theta_k) \right] \, p(k \mid x_i, \Theta^{t-1}) - \lambda_\mu \sum_{\mathcal{N}(i)} (\mu_k(i) - \mu_k(j))^2 \Bigg] \\
&= \frac{\partial}{\partial \mu_k(i)} \left[ \sum_{i=1}^{N} - \frac{(x_i - \mu_k(i))^2}{2\sigma_k^2} \, p(k \mid x_i, \Theta^{t-1}) \right] - \frac{\partial}{\partial \mu_k(i)} \left[ \lambda_\mu \sum_{\mathcal{N}(i)} (\mu_k(i) - \mu_k(j))^2 \right] \\
&= \frac{1}{\sigma_k^2} \sum_{i=1}^{N} (x_i - \mu_k(i)) \, p(k \mid x_i, \Theta^{t-1}) - 2 \lambda_\mu \sum_{j \in \mathcal{N}(i)} (\mu_k(i) - \mu_k(j)) \\
&= 0
\end{aligned}
∂ μ k ( i ) ∂ Q ( Θ ∣ Θ t − 1 ) = ∂ μ k ( i ) ∂ [ i = 1 ∑ N ln [ p k ( x i ∣ θ k ) ] p ( k ∣ x i , Θ t − 1 ) − λ μ N ( i ) ∑ ( μ k ( i ) − μ k ( j ) ) 2 ] = ∂ μ k ( i ) ∂ [ i = 1 ∑ N − 2 σ k 2 ( x i − μ k ( i ) ) 2 p ( k ∣ x i , Θ t − 1 ) ] − ∂ μ k ( i ) ∂ λ μ N ( i ) ∑ ( μ k ( i ) − μ k ( j ) ) 2 = σ k 2 1 i = 1 ∑ N ( x i − μ k ( i )) p ( k ∣ x i , Θ t − 1 ) − 2 λ μ j ∈ N ( i ) ∑ ( μ k ( i ) − μ k ( j )) = 0
其中,N ( i ) \mathcal{N}(i) N ( i ) 表示像素 i i i 的邻域像素集合。由此可以得到更新 μ k ( i ) \mu_k(i) μ k ( i ) 的方程:
μ k ( i ) = ∑ i = 1 N x i p ( k ∣ x i , Θ t − 1 ) + 2 λ μ σ k 2 ∑ j ∈ N ( i ) μ k ( j ) ∑ i = 1 N p ( k ∣ x i , Θ t − 1 ) + 2 λ μ σ k 2 ∣ N ( i ) ∣ ( 1.10 ) \mu_k(i) = \frac{\sum\limits_{i=1}^{N} x_i \, p(k \mid x_i, \Theta^{t-1}) + 2 \lambda_\mu \sigma_k^2 \sum\limits_{j \in \mathcal{N}(i)} \mu_k(j)}{\sum\limits_{i=1}^{N} p(k \mid x_i, \Theta^{t-1}) + 2 \lambda_\mu \sigma_k^2 |\mathcal{N}(i)|} \qquad (1.10)
μ k ( i ) = i = 1 ∑ N p ( k ∣ x i , Θ t − 1 ) + 2 λ μ σ k 2 ∣ N ( i ) ∣ i = 1 ∑ N x i p ( k ∣ x i , Θ t − 1 ) + 2 λ μ σ k 2 j ∈ N ( i ) ∑ μ k ( j ) ( 1.10 )
其中,∣ N ( i ) ∣ |\mathcal{N}(i)| ∣ N ( i ) ∣ 表示像素 i i i 的邻域像素数量。
接下来,对方差 σ k 2 ( i ) \sigma_k^2(i) σ k 2 ( i ) 求偏导,可以将 Q ( Θ ∣ Θ t − 1 ) Q(\Theta \mid \Theta^{t-1}) Q ( Θ ∣ Θ t − 1 ) 对 σ k 2 ( i ) \sigma_k^2(i) σ k 2 ( i ) 求偏导,并令其为零,有:
∂ Q ( Θ ∣ Θ t − 1 ) ∂ σ k 2 ( i ) = ∂ ∂ σ k 2 ( i ) [ ∑ i = 1 N ln [ p k ( x i ∣ θ k ) ] p ( k ∣ x i , Θ t − 1 ) − λ σ ∑ j ∈ N ( i ) ( σ k 2 ( i ) − σ k 2 ( j ) ) 2 ] = ∂ ∂ σ k 2 ( i ) [ ∑ i = 1 N ( − 1 2 ln ( 2 π σ k 2 ( i ) ) − ( x i − μ k ( i ) ) 2 2 σ k 2 ( i ) ) p ( k ∣ x i , Θ t − 1 ) ] − ∂ ∂ σ k 2 ( i ) [ λ σ ∑ j ∈ N ( i ) ( σ k 2 ( i ) − σ k 2 ( j ) ) 2 ] = ∑ i = 1 N [ − 1 2 σ k 2 ( i ) + ( x i − μ k ( i ) ) 2 2 ( σ k 2 ( i ) ) 2 ] p ( k ∣ x i , Θ t − 1 ) − 2 λ σ ∑ j ∈ N ( i ) ( σ k 2 ( i ) − σ k 2 ( j ) ) = 0 \begin{aligned}
\frac{\partial Q(\Theta \mid \Theta^{t-1})}{\partial \sigma_k^2(i)}
&= \frac{\partial}{\partial \sigma_k^2(i)} \Bigg[ \sum_{i=1}^{N} \ln \left[ p_k(x_i \mid \theta_k) \right] \, p(k \mid x_i, \Theta^{t-1}) - \lambda_\sigma \sum_{j \in \mathcal{N}(i)} (\sigma_k^2(i) - \sigma_k^2(j))^2 \Bigg] \\
&= \frac{\partial}{\partial \sigma_k^2(i)} \left[ \sum_{i=1}^{N} \left( -\frac{1}{2} \ln(2 \pi \sigma_k^2(i)) - \frac{(x_i - \mu_k(i))^2}{2 \sigma_k^2(i)} \right) \, p(k \mid x_i, \Theta^{t-1}) \right] - \frac{\partial}{\partial \sigma_k^2(i)} \left[ \lambda_\sigma \sum_{j \in \mathcal{N}(i)} (\sigma_k^2(i) - \sigma_k^2(j))^2 \right] \\
&= \sum_{i=1}^{N} \left[ -\frac{1}{2 \sigma_k^2(i)} + \frac{(x_i - \mu_k(i))^2}{2 (\sigma_k^2(i))^2} \right] p(k \mid x_i, \Theta^{t-1}) - 2 \lambda_\sigma \sum_{j \in \mathcal{N}(i)} (\sigma_k^2(i) - \sigma_k^2(j)) \\
&= 0
\end{aligned}
∂ σ k 2 ( i ) ∂ Q ( Θ ∣ Θ t − 1 ) = ∂ σ k 2 ( i ) ∂ [ i = 1 ∑ N ln [ p k ( x i ∣ θ k ) ] p ( k ∣ x i , Θ t − 1 ) − λ σ j ∈ N ( i ) ∑ ( σ k 2 ( i ) − σ k 2 ( j ) ) 2 ] = ∂ σ k 2 ( i ) ∂ [ i = 1 ∑ N ( − 2 1 ln ( 2 π σ k 2 ( i )) − 2 σ k 2 ( i ) ( x i − μ k ( i ) ) 2 ) p ( k ∣ x i , Θ t − 1 ) ] − ∂ σ k 2 ( i ) ∂ λ σ j ∈ N ( i ) ∑ ( σ k 2 ( i ) − σ k 2 ( j ) ) 2 = i = 1 ∑ N [ − 2 σ k 2 ( i ) 1 + 2 ( σ k 2 ( i ) ) 2 ( x i − μ k ( i ) ) 2 ] p ( k ∣ x i , Θ t − 1 ) − 2 λ σ j ∈ N ( i ) ∑ ( σ k 2 ( i ) − σ k 2 ( j )) = 0
令其为 0:
∑ i = 1 N [ − 1 2 σ k 2 ( i ) + ( x i − μ k ( i ) ) 2 2 ( σ k 2 ( i ) ) 2 ] p ( k ∣ x i , Θ t − 1 ) = 2 λ σ ∑ j ∈ N ( i ) ( σ k 2 ( i ) − σ k 2 ( j ) ) \sum_{i=1}^{N} \left[-\frac{1}{2\sigma_k^2(i)} + \frac{(x_i - \mu_k(i))^2}{2 (\sigma_k^2(i))^2} \right] p(k \mid x_i, \Theta^{t-1})
= 2 \lambda_\sigma \sum_{j \in \mathcal{N}(i)} (\sigma_k^2(i) - \sigma_k^2(j))
i = 1 ∑ N [ − 2 σ k 2 ( i ) 1 + 2 ( σ k 2 ( i ) ) 2 ( x i − μ k ( i ) ) 2 ] p ( k ∣ x i , Θ t − 1 ) = 2 λ σ j ∈ N ( i ) ∑ ( σ k 2 ( i ) − σ k 2 ( j ))
为了便于求解,我们将其改写成标准形式:
∑ i = 1 N p ( k ∣ x _ i , Θ t − 1 ) ( x i − μ k ( i ) ) 2 − σ k 2 ( i ) 2 ( σ k 2 ( i ) ) 2 = 2 λ σ ∑ j ∈ N ( i ) ( σ k 2 ( i ) − σ k 2 ( j ) ) \sum_{i=1}^{N} p(k \mid x\_i, \Theta^{t-1}) \frac{(x_i - \mu_k(i))^2 - \sigma_k^2(i)}{2 (\sigma_k^2(i))^2}
= 2 \lambda_\sigma \sum_{j \in \mathcal{N}(i)} (\sigma_k^2(i) - \sigma_k^2(j))
i = 1 ∑ N p ( k ∣ x _ i , Θ t − 1 ) 2 ( σ k 2 ( i ) ) 2 ( x i − μ k ( i ) ) 2 − σ k 2 ( i ) = 2 λ σ j ∈ N ( i ) ∑ ( σ k 2 ( i ) − σ k 2 ( j ))
两边同时乘以 2 ( σ _ k 2 ( i ) ) 2 2 (\sigma\_k^2(i))^2 2 ( σ _ k 2 ( i ) ) 2 :
∑ i = 1 N p ( k ∣ x _ i , Θ t − 1 ) [ ( x i − μ k ( i ) ) 2 − σ k 2 ( i ) ] = 4 λ σ ( σ k 2 ( i ) ) 2 ∑ j ∈ N ( i ) ( 1 − σ k 2 ( j ) σ k 2 ( i ) ) \sum_{i=1}^{N} p(k \mid x\_i, \Theta^{t-1}) \left[ (x_i - \mu_k(i))^2 - \sigma_k^2(i) \right]
= 4 \lambda_\sigma (\sigma_k^2(i))^2 \sum_{j \in \mathcal{N}(i)} \left(1 - \frac{\sigma_k^2(j)}{\sigma_k^2(i)} \right)
i = 1 ∑ N p ( k ∣ x _ i , Θ t − 1 ) [ ( x i − μ k ( i ) ) 2 − σ k 2 ( i ) ] = 4 λ σ ( σ k 2 ( i ) ) 2 j ∈ N ( i ) ∑ ( 1 − σ k 2 ( i ) σ k 2 ( j ) )
这个方程是一个非线性方程,严格解析解比较复杂,因此通常采用迭代方式更新 σ _ k 2 ( i ) \sigma\_k^2(i) σ _ k 2 ( i ) :
σ k 2 ( i ) ← ∑ i = 1 N p ( k ∣ x _ i , Θ t − 1 ) ( x i − μ k ( i ) ) 2 + 4 λ σ ∑ j ∈ N ( i ) σ k 2 ( j ) ∑ i = 1 N p ( k ∣ x _ i , Θ t − 1 ) + 4 λ σ ∣ N ( i ) ∣ \sigma_k^2(i) \gets \frac{\sum\limits_{i=1}^{N} p(k \mid x\_i, \Theta^{t-1}) (x_i - \mu_k(i))^2 + 4 \lambda_\sigma \sum\limits_{j \in \mathcal{N}(i)} \sigma_k^2(j)}{\sum\limits_{i=1}^{N} p(k \mid x\_i, \Theta^{t-1}) + 4 \lambda_\sigma |\mathcal{N}(i)|}
σ k 2 ( i ) ← i = 1 ∑ N p ( k ∣ x _ i , Θ t − 1 ) + 4 λ σ ∣ N ( i ) ∣ i = 1 ∑ N p ( k ∣ x _ i , Θ t − 1 ) ( x i − μ k ( i ) ) 2 + 4 λ σ j ∈ N ( i ) ∑ σ k 2 ( j )
由此可以得到更新 σ k 2 ( i ) \sigma_k^2(i) σ k 2 ( i ) 的方程:
σ k 2 ( i ) = ∑ i = 1 N p ( k ∣ x i , Θ t − 1 ) ( x i − μ k ( i ) ) 2 + 4 λ σ ∑ j ∈ N ( i ) σ k 2 ( j ) ∑ i = 1 N p ( k ∣ x i , Θ t − 1 ) + 4 λ σ ∣ N ( i ) ∣ ( 1.11 ) \sigma_k^2(i) = \frac{\sum\limits_{i=1}^{N} p(k \mid x_i, \Theta^{t-1}) \, (x_i - \mu_k(i))^2 + 4 \lambda_\sigma \sum\limits_{j \in \mathcal{N}(i)} \sigma_k^2(j)}{\sum\limits_{i=1}^{N} p(k \mid x_i, \Theta^{t-1}) + 4 \lambda_\sigma |\mathcal{N}(i)|} \qquad (1.11)
σ k 2 ( i ) = i = 1 ∑ N p ( k ∣ x i , Θ t − 1 ) + 4 λ σ ∣ N ( i ) ∣ i = 1 ∑ N p ( k ∣ x i , Θ t − 1 ) ( x i − μ k ( i ) ) 2 + 4 λ σ j ∈ N ( i ) ∑ σ k 2 ( j ) ( 1.11 )
最后,对权重 α k ( i ) \alpha_k(i) α k ( i ) 求偏导,可以将 Q ( Θ ∣ Θ t − 1 ) Q(\Theta \mid \Theta^{t-1}) Q ( Θ ∣ Θ t − 1 ) 对 ln α k ( i ) \ln \alpha_k(i) ln α k ( i ) 求偏导,并令其为零,有:
∂ Q ( Θ ∣ Θ t − 1 ) ∂ ln α k ( i ) = ∂ ∂ ln α k ( i ) [ ∑ i = 1 N ln ( α k ( i ) ) p ( k ∣ x i , Θ t − 1 ) − λ α ∑ j ∈ N ( i ) ( ln α k ( i ) − ln α k ( j ) ) 2 ] = ∑ i = 1 N p ( k ∣ x i , Θ t − 1 ) − 2 λ α ∑ j ∈ N ( i ) ( ln α k ( i ) − ln α k ( j ) ) = 0 \begin{aligned}
\frac{\partial Q(\Theta \mid \Theta^{t-1})}{\partial \ln \alpha_k(i)}
&= \frac{\partial}{\partial \ln \alpha_k(i)} \Bigg[ \sum_{i=1}^{N} \ln (\alpha_k(i)) \, p(k \mid x_i, \Theta^{t-1}) - \lambda_\alpha \sum_{j \in \mathcal{N}(i)} (\ln \alpha_k(i) - \ln \alpha_k(j))^2 \Bigg] \\
&= \sum_{i=1}^{N} p(k \mid x_i, \Theta^{t-1}) - 2 \lambda_\alpha \sum_{j \in \mathcal{N}(i)} (\ln \alpha_k(i) - \ln \alpha_k(j)) \\
&= 0
\end{aligned}
∂ ln α k ( i ) ∂ Q ( Θ ∣ Θ t − 1 ) = ∂ ln α k ( i ) ∂ [ i = 1 ∑ N ln ( α k ( i )) p ( k ∣ x i , Θ t − 1 ) − λ α j ∈ N ( i ) ∑ ( ln α k ( i ) − ln α k ( j ) ) 2 ] = i = 1 ∑ N p ( k ∣ x i , Θ t − 1 ) − 2 λ α j ∈ N ( i ) ∑ ( ln α k ( i ) − ln α k ( j )) = 0
由此可以得到更新 α k ( i ) \alpha_k(i) α k ( i ) 的方程(经过指数映射保证非负并归一化):
α k ( i ) = exp ( 1 ∣ N ( i ) ∣ ∑ j ∈ N ( i ) ln α k ( j ) + p ( k ∣ x i , Θ t − 1 ) 2 λ α ∣ N ( i ) ∣ ) ( 1.12 ) \alpha_k(i) = \exp\Bigg(\frac{1}{|\mathcal{N}(i)|} \sum_{j \in \mathcal{N}(i)} \ln \alpha_k(j) + \frac{p(k \mid x_i, \Theta^{t-1})}{2 \lambda_\alpha |\mathcal{N}(i)|}\Bigg) \qquad (1.12)
α k ( i ) = exp ( ∣ N ( i ) ∣ 1 j ∈ N ( i ) ∑ ln α k ( j ) + 2 λ α ∣ N ( i ) ∣ p ( k ∣ x i , Θ t − 1 ) ) ( 1.12 )
对像素 i 的 M 个分布进行归一化,如下:
α k ( i ) = α k ( i ) ∑ l = 1 M α l ( i ) \alpha_k(i) = \frac{\alpha_k(i)}{\sum\limits_{l=1}^{M} \alpha_l(i)}
α k ( i ) = l = 1 ∑ M α l ( i ) α k ( i )