Fitting Methods

This page describes the three main fitting methods implemented in StructuredGaussianMixtures.jl: EM, PCAEM, and FactorEM.

Overview

All fitting methods implement the GMMFitMethod interface and can be used with the fit function:

gmm = fit(fitmethod, data)

GMMFitMethod Interface

EM: Standard Expectation Maximization

The EM method fits Gaussian Mixture Models with full covariance matrices using standard Expectation Maximization.

Constructor

StructuredGaussianMixtures.EMType
EM

Standard Expectation Maximization for fitting Gaussian Mixture Models.

Fields

  • n_components: Number of mixture components
  • method: Initialization method (:kmeans, :rand, etc.)
  • kind: Covariance structure (:full, :diag, etc.)
  • nInit: Number of initializations
  • nIter: Maximum number of iterations
  • nFinal: Number of final iterations

Usage

using StructuredGaussianMixtures

# Basic usage
fitmethod = EM(3)
gmm = fit(fitmethod, data)

# With custom parameters
fitmethod = EM(5; method=:rand, kind=:full, nInit=100, nIter=20)
gmm = fit(fitmethod, data)

Fit Method

StructuredGaussianMixtures.fitMethod
fit(fitmethod::EM, x::Matrix)

Fit a Gaussian Mixture Model using Expectation Maximization.

Arguments

  • fitmethod: The EM fitting method configuration
  • x: The data matrix (nfeatures, nsamples)

Returns

  • A MixtureModel of MvNormal distributions

Notes

  • Uses GaussianMixtures.jl's GMM implementation
  • Supports different initialization methods and covariance structures

PCAEM: Mixture of Probabilistic Principal Component Analysis

PCAEM fits a GMM in PCA-reduced space and transforms back to the original space, effectively learning low-rank covariance structures.

Constructor

StructuredGaussianMixtures.PCAEMType
PCAEM

Fits a structured GMM by fitting a GMM in PCA-compressed space.

Fields

  • n_components: Number of mixture components
  • rank: Number of principal components to use
  • gmm_method: Initialization method for GMM (:kmeans, :rand, etc.)
  • gmm_kind: Covariance structure for GMM (:full, :diag, etc.)
  • gmm_nInit: Number of GMM initializations
  • gmm_nIter: Maximum number of GMM iterations
  • gmm_nFinal: Number of final GMM iterations

Usage

# Basic usage
fitmethod = PCAEM(3, 2)
gmm = fit(fitmethod, data)

# With custom parameters
fitmethod = PCAEM(5, 3; gmm_method=:rand, gmm_nInit=100)
gmm = fit(fitmethod, data)

Fit Method

StructuredGaussianMixtures.fitMethod
fit(fitmethod::PCAEM, x::Matrix)

This method first performs PCA to reduce dimensionality, then fits a GMM in the reduced space, and finally transforms the components back to the original space as LRDMvNormal distributions.

Arguments

  • fitmethod: The PCAEM fitting method configuration
  • x: The data matrix (nfeatures, nsamples)

Returns

  • A MixtureModel of LRDMvNormal distributions

Notes

  • Uses PCA for dimensionality reduction
  • Fits GMM in the reduced space
  • Transforms components back to original space with low-rank plus diagonal structure
  • The diagonal noise term is estimated from PCA reconstruction error

FactorEM: Mixture of Factor Analyzers

FactorEM directly fits GMMs with covariance matrices constrained to the form Σ = FF' + D, where F is a low-rank factor matrix and D is diagonal.

Constructor

StructuredGaussianMixtures.FactorEMType
FactorEM

Mixture of Factor Analyzers model. Directly fits a mixture of low-rank plus diagonal Gaussian distributions.

Fields

  • n_components: Number of mixture components
  • rank: Rank of the low-rank factor
  • gmm_method: Initialization method for GMM (:kmeans, :rand, etc.)
  • gmm_nInit: Number of GMM initializations
  • gmm_nIter: Maximum number of GMM iterations
  • gmm_nFinal: Number of final GMM iterations

Usage

# Basic usage
fitmethod = FactorEM(3, 2)
gmm = fit(fitmethod, data)

# With custom parameters
fitmethod = FactorEM(5, 3; initialization_method=:rand, nInit=10, nIter=20)
gmm = fit(fitmethod, data)

# With weighted data
weights = ones(size(data, 2))  # Equal weights
gmm = fit(fitmethod, data, weights)

Fit Methods

StructuredGaussianMixtures.fitMethod
fit(fitmethod::FactorEM, x::Matrix)

Fit a Mixture of Factor Analyzers model using Expectation Maximization. This method directly fits a mixture of low-rank plus diagonal Gaussian distributions.

Arguments

  • fitmethod: The FactorEM fitting method configuration
  • x: The data matrix (nfeatures, nsamples)

Returns

  • A MixtureModel of LRDMvNormal distributions

Notes

  • Directly fits the low-rank plus diagonal structure
  • More computationally intensive than PCAEM but potentially more accurate
  • Not yet implemented
StructuredGaussianMixtures.fitMethod
fit(fitmethod::FactorEM, x::Matrix, weights::Vector)

Fit a Mixture of Factor Analyzers model using Expectation Maximization with weighted data points.

Arguments

  • fitmethod: The FactorEM fitting method configuration
  • x: The data matrix (nfeatures, nsamples)
  • weights: Vector of weights for each data point

Returns

  • A MixtureModel of LRDMvNormal distributions

Notes

  • Supports weighted data points for importance sampling or missing data scenarios
  • Weights are automatically normalized to sum to 1
  • Uses the same EM algorithm but with weighted responsibilities

Method Comparison

When to Use Each Method

MethodBest ForCovariance StructureComputational CostWeighted Fitting
FactorEMHigh-dimensional data, direct low-rank fittingLow-rank + diagonalO(r³ + mr² + nmr) where r < m
EMLow-dimensional data, full covariance neededFullO(m³ + nm²)
PCAEMHigh-dimensional data with shared low-dimensional structureLow-rank + diagonalO(r³ + mr² + nr² + min(n²m,nm²)) where r < m

Simple Examples

Basic Fitting

using StructuredGaussianMixtures

# Generate some data
data = randn(2, 1000)

# Fit with different methods
gmm_em = fit(EM(3), data)
gmm_pca = fit(PCAEM(3, 1), data)
gmm_factor = fit(FactorEM(3, 1), data)

# Evaluate
println("EM log-likelihood: ", mean(logpdf(gmm_em, data)))
println("PCAEM log-likelihood: ", mean(logpdf(gmm_pca, data)))
println("FactorEM log-likelihood: ", mean(logpdf(gmm_factor, data)))

Weighted Fitting

# Create weights based on data values
weights = [data[1, i] > 0 ? 1.0 : 0.5 for i in 1:size(data, 2)]

# Fit with weights (only FactorEM supports this)
gmm_weighted = fit(FactorEM(3, 1), data, weights)

High-Dimensional Data

# For high-dimensional data
high_dim_data = randn(100, 50)

# Use low-rank methods
gmm_pca = fit(PCAEM(3, 10), high_dim_data)
gmm_factor = fit(FactorEM(3, 10), high_dim_data)
  • Structured Gaussians: Learn about the low-rank plus diagonal distribution used by PCAEM and FactorEM methods