Fitting Methods
This page describes the three main fitting methods implemented in StructuredGaussianMixtures.jl: EM, PCAEM, and FactorEM.
Overview
All fitting methods implement the GMMFitMethod
interface and can be used with the fit
function:
gmm = fit(fitmethod, data)
GMMFitMethod Interface
StructuredGaussianMixtures.GMMFitMethod
— TypeGMMFitMethod
Abstract type for Gaussian Mixture Model fitting methods.
EM: Standard Expectation Maximization
The EM method fits Gaussian Mixture Models with full covariance matrices using standard Expectation Maximization.
Constructor
StructuredGaussianMixtures.EM
— TypeEM
Standard Expectation Maximization for fitting Gaussian Mixture Models.
Fields
n_components
: Number of mixture componentsmethod
: Initialization method (:kmeans
,:rand
, etc.)kind
: Covariance structure (:full
,:diag
, etc.)nInit
: Number of initializationsnIter
: Maximum number of iterationsnFinal
: Number of final iterations
Usage
using StructuredGaussianMixtures
# Basic usage
fitmethod = EM(3)
gmm = fit(fitmethod, data)
# With custom parameters
fitmethod = EM(5; method=:rand, kind=:full, nInit=100, nIter=20)
gmm = fit(fitmethod, data)
Fit Method
StructuredGaussianMixtures.fit
— Methodfit(fitmethod::EM, x::Matrix)
Fit a Gaussian Mixture Model using Expectation Maximization.
Arguments
fitmethod
: The EM fitting method configurationx
: The data matrix (nfeatures, nsamples)
Returns
- A MixtureModel of MvNormal distributions
Notes
- Uses GaussianMixtures.jl's GMM implementation
- Supports different initialization methods and covariance structures
PCAEM: Mixture of Probabilistic Principal Component Analysis
PCAEM fits a GMM in PCA-reduced space and transforms back to the original space, effectively learning low-rank covariance structures.
Constructor
StructuredGaussianMixtures.PCAEM
— TypePCAEM
Fits a structured GMM by fitting a GMM in PCA-compressed space.
Fields
n_components
: Number of mixture componentsrank
: Number of principal components to usegmm_method
: Initialization method for GMM (:kmeans
,:rand
, etc.)gmm_kind
: Covariance structure for GMM (:full
,:diag
, etc.)gmm_nInit
: Number of GMM initializationsgmm_nIter
: Maximum number of GMM iterationsgmm_nFinal
: Number of final GMM iterations
Usage
# Basic usage
fitmethod = PCAEM(3, 2)
gmm = fit(fitmethod, data)
# With custom parameters
fitmethod = PCAEM(5, 3; gmm_method=:rand, gmm_nInit=100)
gmm = fit(fitmethod, data)
Fit Method
StructuredGaussianMixtures.fit
— Methodfit(fitmethod::PCAEM, x::Matrix)
This method first performs PCA to reduce dimensionality, then fits a GMM in the reduced space, and finally transforms the components back to the original space as LRDMvNormal distributions.
Arguments
fitmethod
: The PCAEM fitting method configurationx
: The data matrix (nfeatures, nsamples)
Returns
- A MixtureModel of LRDMvNormal distributions
Notes
- Uses PCA for dimensionality reduction
- Fits GMM in the reduced space
- Transforms components back to original space with low-rank plus diagonal structure
- The diagonal noise term is estimated from PCA reconstruction error
FactorEM: Mixture of Factor Analyzers
FactorEM directly fits GMMs with covariance matrices constrained to the form Σ = FF' + D, where F is a low-rank factor matrix and D is diagonal.
Constructor
StructuredGaussianMixtures.FactorEM
— TypeFactorEM
Mixture of Factor Analyzers model. Directly fits a mixture of low-rank plus diagonal Gaussian distributions.
Fields
n_components
: Number of mixture componentsrank
: Rank of the low-rank factorgmm_method
: Initialization method for GMM (:kmeans
,:rand
, etc.)gmm_nInit
: Number of GMM initializationsgmm_nIter
: Maximum number of GMM iterationsgmm_nFinal
: Number of final GMM iterations
Usage
# Basic usage
fitmethod = FactorEM(3, 2)
gmm = fit(fitmethod, data)
# With custom parameters
fitmethod = FactorEM(5, 3; initialization_method=:rand, nInit=10, nIter=20)
gmm = fit(fitmethod, data)
# With weighted data
weights = ones(size(data, 2)) # Equal weights
gmm = fit(fitmethod, data, weights)
Fit Methods
StructuredGaussianMixtures.fit
— Methodfit(fitmethod::FactorEM, x::Matrix)
Fit a Mixture of Factor Analyzers model using Expectation Maximization. This method directly fits a mixture of low-rank plus diagonal Gaussian distributions.
Arguments
fitmethod
: The FactorEM fitting method configurationx
: The data matrix (nfeatures, nsamples)
Returns
- A MixtureModel of LRDMvNormal distributions
Notes
- Directly fits the low-rank plus diagonal structure
- More computationally intensive than PCAEM but potentially more accurate
- Not yet implemented
StructuredGaussianMixtures.fit
— Methodfit(fitmethod::FactorEM, x::Matrix, weights::Vector)
Fit a Mixture of Factor Analyzers model using Expectation Maximization with weighted data points.
Arguments
fitmethod
: The FactorEM fitting method configurationx
: The data matrix (nfeatures, nsamples)weights
: Vector of weights for each data point
Returns
- A MixtureModel of LRDMvNormal distributions
Notes
- Supports weighted data points for importance sampling or missing data scenarios
- Weights are automatically normalized to sum to 1
- Uses the same EM algorithm but with weighted responsibilities
Method Comparison
When to Use Each Method
Method | Best For | Covariance Structure | Computational Cost | Weighted Fitting |
---|---|---|---|---|
FactorEM | High-dimensional data, direct low-rank fitting | Low-rank + diagonal | O(r³ + mr² + nmr) where r < m | ✅ |
EM | Low-dimensional data, full covariance needed | Full | O(m³ + nm²) | ❌ |
PCAEM | High-dimensional data with shared low-dimensional structure | Low-rank + diagonal | O(r³ + mr² + nr² + min(n²m,nm²)) where r < m | ❌ |
Simple Examples
Basic Fitting
using StructuredGaussianMixtures
# Generate some data
data = randn(2, 1000)
# Fit with different methods
gmm_em = fit(EM(3), data)
gmm_pca = fit(PCAEM(3, 1), data)
gmm_factor = fit(FactorEM(3, 1), data)
# Evaluate
println("EM log-likelihood: ", mean(logpdf(gmm_em, data)))
println("PCAEM log-likelihood: ", mean(logpdf(gmm_pca, data)))
println("FactorEM log-likelihood: ", mean(logpdf(gmm_factor, data)))
Weighted Fitting
# Create weights based on data values
weights = [data[1, i] > 0 ? 1.0 : 0.5 for i in 1:size(data, 2)]
# Fit with weights (only FactorEM supports this)
gmm_weighted = fit(FactorEM(3, 1), data, weights)
High-Dimensional Data
# For high-dimensional data
high_dim_data = randn(100, 50)
# Use low-rank methods
gmm_pca = fit(PCAEM(3, 10), high_dim_data)
gmm_factor = fit(FactorEM(3, 10), high_dim_data)
Related Documentation
- Structured Gaussians: Learn about the low-rank plus diagonal distribution used by PCAEM and FactorEM methods