Home /
Expert Answers /
Computer Science /
estimates-of-parameters-of-gmm-the-expectation-maximization-em-algorithm-we-observe-n-data-poin-pa649
(Solved): Estimates of Parameters of GMM: The Expectation Maximization (EM) Algorithm We observe n data poin ...
Estimates of Parameters of GMM: The Expectation Maximization (EM) Algorithm We observe n data points x1?,…,xn? in Rd. We wish to maximize the GMM likelihood with respect to the parameter set ?={p1?,…,pK?,?(1),…,?(K),?12?,…,?K2?} Maximizing the log-likelihood log(?i=1n?p(x(i)??)) is not tractable in the setting of GMMs. There is no closed-form solution to finding the parameter set ? that maximizes the likelihood. The EM algorithm is an iterative algorithm that finds a locally optimal solution ?^ to the GMM likelihood maximization problem. E Step The E Step of the algorithm involves finding the posterior probability that point x(i) was generated by cluster j, for every i=1,…,n and j=1,…,K. This step assumes the knowledge of the parameter set ?. We find the posterior using the following equation: p( point x(i) was generated by cluster j?x(i),?)?p(j?i)=p(x(i)??)pj?N(x(i);?(j),?j2?I)? M Step The M Step of the algorithm maximizes a proxy function ?^(x(1),…,x(n)??) of the log-likelihood over ?, where ?^(x(1),…,x(n)??)??i=1n??j=1K?p(j?i)log(p(j?i)p(x(i) and x(i) generated by cluster j??)?) This is done instead of maximizing over ? the actual log-likelihood ?(x(1),…,x(n)??)=?i=1n?log[?j=1K?p(x(i) generated by cluster j??)] Maximizing the proxy function over the parameter set ?, one can verify by taking derivatives and setting them equal to zero that ?(j)?pj???j2???=?i=1n?p(j?i)?i=1n?p(j?i)x(i)?=n1?i=1?n?p(j?i),=d?i=1n?p(j?i)?i=1n?p(j?i)???x(i)??(j)????2?.? The E and M steps are repeated iteratively until there is no noticeable change in the actual likelihood computed after M step using the newly estimated parameters or if the parameters do not vary by much. Initialization As for the initialization before the first time E step is carried out, we can either do a random initialization of the parameter set ? or we can employ k-means to find the initial cluster centers of the K clusters and use the global variance of the dataset as the initial variance of all the K clusters. In the latter case, the mixture weights can be initialized to the proportion of data points in the clusters as found by the k-means algorithm.
Gaussian Mixture Model: An Example Update - E-Step Assume that the initial means and variances of two clusters in a GMM are as follows: ?(1)=?3,?(2)=2,?12?=?22?=4. Let p1?=p2?=0.5. Let x(1)=0.2,x(2)=?0.9,x(3)=?1,x(4)=1.2,x(5)=1.8 be five points that we wish to cluster. In this problem and in the next, we compute the updated parameters corresponding to cluster 1 . You may use any computational tool at your disposal. Compute the following posterior probabilities (provide at least five decimal digits): p(1?1)=p(1?2)=p(1?3)=p(1?4)=p(1?5)=
Gaussian Mixture Model: An Example Update - M-Step Compute the updated parameters corresponding to cluster 1 (provide at lea p^?1?=?^?1?=?^12?=