ExpFamilyPCA.jl Documentation

ExpFamilyPCA.jl is a Julia package for performing exponential principal component analysis (EPCA) (Collins et al., 2001). EPCA generalizes PCA to accommodate data from any exponential family distribution, making it more suitable for fields where these data types are common, such as geochemistry, marketing, genomics, political science, and machine learning (Greenacre, 2021; Hastie et al., 2009).

ExpFamilyPCA.jl the first EPCA package in Julia and the first in any language to support EPCA for multiple distributions and custom objectives.

Installation

To install the package, use the Julia package manager. In the Julia REPL, type:

using Pkg; Pkg.add("ExpFamilyPCA")

Quickstart

using ExpFamilyPCA

indim = 5
X = rand(1:100, (10, indim))  # data matrix to compress
outdim = 3  # target compression dimension

poisson_epca = PoissonEPCA(indim, outdim)

X_compressed = fit!(poisson_epca, X; maxiter=200, verbose=true)

Y = rand(1:100, (3, indim))  # test data
Y_compressed = compress(poisson_epca, Y; maxiter=200, verbose=true)

X_reconstructed = decompress(poisson_epca, X_compressed)
Y_reconstructed = decompress(poisson_epca, Y_compressed)

Supported Distributions

DistributionExpFamilyPCA.jlObjectiveLink Function $g(\theta)$
BernoulliBernoulliEPCA$\log(1 + e^{\theta - 2x\theta})$$\frac{e^\theta}{1 + e^\theta}$
BinomialBinomialEPCA$n \log(1 + e^\theta) - x\theta$$\frac{ne^\theta}{1 + e^\theta}$
Continuous BernoulliContinuousBernoulliEPCA$\log\left(\frac{e^\theta - 1}{\theta}\right) - x\theta$$\frac{\theta - 1}{\theta} + \frac{1}{e^\theta - 1}$
Gamma¹GammaEPCA or ItakuraSaitoEPCA$-\log(-\theta) - x\theta$$-\frac{1}{\theta}$
Gaussian²GaussianEPCA or NormalEPCA$\frac{1}{2}(x - \theta)^2$$\theta$
Negative BinomialNegativeBinomialEPCA$-r \log(1 - e^\theta) - x\theta$$\frac{-re^\theta}{e^\theta - 1}$
ParetoParetoEPCA$-\log(-1 - \theta) + \theta \log m - x\theta$$\log m - \frac{1}{\theta + 1}$
Poisson³PoissonEPCA$e^\theta - x\theta$$e^\theta$
WeibullWeibullEPCA$-\log(-\theta) - x\theta$$-\frac{1}{\theta}$

¹: For the Gamma distribution, the link function is typically based on the inverse relationship.

²: For Gaussian, also known as Normal distribution, the link function is the identity.

³: The Poisson distribution link function is exponential.

Contributing

ExpFamilyPCA.jl was designed through the Stanford Intelligent Systems Laboratory (SISL) which helps maintain open-source projects posted on the SISL organization repository and the JuliaPOMDPs community. We subscribe to the community's contribution guidelines and encourage new users with all levels of software development experience to contribute and ask questions.

If you would like to create a new compressor or improve an existing model, please open a new issue that briefly describes the contribution you would like to make and the problem that it fixes, if there is one.