ILNumerics - Technical Application Development
Assembly: ILNumerics.Toolboxes.MachineLearning (in ILNumerics.Toolboxes.MachineLearning.dll) Version: 5.5.0.0 (5.5.7503.3146)
Estimated centers for all clusters, size samples.D[0] x k.
Expectation maximization algorithm.
[ILNumerics Machine Learning Toolbox]
Namespace: ILNumerics.Toolboxes
Assembly: ILNumerics.Toolboxes.MachineLearning (in ILNumerics.Toolboxes.MachineLearning.dll) Version: 5.5.0.0 (5.5.7503.3146)
Syntax
public static RetArray<double> em( InArray<double> Samples, int k, OutArray<double> Sigma = null, EMInitializationMethod method = EMInitializationMethod.KMeans_random, InArray<double> UserCenters = null, int maxiterexit = 10000, double centerconverg_exit = 0,001 )
Parameters
- Samples
- Type: ILNumericsInArrayDouble
Input data, data points in columns. - k
- Type: SystemInt32
Number of clusters. - Sigma (Optional)
- Type: ILNumericsOutArrayDouble
[Output] Covariance estimation for all clusters, size d x d x k, d = samples.D[0]. - method (Optional)
- Type: ILNumerics.ToolboxesEMInitializationMethod
[Optional] Method used for initializing the cluster centers, default: kmeans_random. - UserCenters (Optional)
- Type: ILNumericsInArrayDouble
[Optional] For method 'user': initial cluster centers, size samples.D[0] x k, for other methods ignored. - maxiterexit (Optional)
- Type: SystemInt32
[Optional] Break after that number of iterations, if no convergence was reached. - centerconverg_exit (Optional)
- Type: SystemDouble
[Optional] Exit iteration if norm(L) falls below that value, default: 0.001.
Return Value
Type: RetArrayDoubleEstimated centers for all clusters, size samples.D[0] x k.
Remarks
The EM algorithm expects the data samples to be drawn from k multivariate normal distributions. It estimates the parameters 'center' and 'sigma (covariance)' of every distribution. Therefore, the position and 'shape' of each distribution is calculated in such a way, that the likelyhood of generating the given sample points is maximized.
The parameter k must be determined by the user. This reflects the a priori knowledge of the number of distributions or clusters in the data.
The algorithm exits, if one of the exit criteria is reached:
- norm(L) < 'centerconverg_exit' - where L is the difference between the centers from the last step and the centers just computed in the current step
- the number of iteration steps exceeds the limit of 'maxiterexit' iterations.
[ILNumerics Machine Learning Toolbox]
See Also