Gaussian Mixture Model
Gaussian mixture models are defined with:
Means (
means): The means for each of the \(n\) Gaussians that makes up a distribution.Standard deviations (
stds): The standard deviations for each of the \(n\) Gaussians that makes up a distribution.Weights (
weights): The relative weight given to each of the \(n\) Gaussians that make up a distribution.
Use cases
Gaussian mixture models are well suited to fitting real data distributions with multiple distinct modes. They can be represented with only a few parameters, and so are a more efficient way to store large numbers of distributions at a time than parameterizations like histograms or interpolations.
Behaviour
Gaussian mixture model Ensembles operate in the following ways:
Ensemble.pdf(x)andEnsemble.cdf(x)are computed as a weighted sum of each of the component Gaussian’spdf(x)andcdf(x).Ensemble.ppf(x)is calculated from a fixed grid ofcdf()values that are interpolated linearly usingscipy.interpolate.interp1d.ppf(0)returns negative infinity, andppf(1)returns positive infinity.Ensemble.x_samples()returns a range of \(x\) values that should plot the majority of all the distributions. It may exclude the tail of some distributions. The minimum of the range is calculated as the lowest mean minus the largest standard deviation, and the maximum is calculated as the reverse.
Data structure
See Data Structure for general details on the data structure of Ensembles.
Metadata Dictionary
Key |
Example value |
Description |
|---|---|---|
“pdf_name” |
|
The parameterization type |
“pdf_version” |
|
Version of parameterization type used |
Data Dictionary
Key |
Example value |
Description |
|---|---|---|
“means” |
|
The means of each Gaussian, of shape (\(n_{pdf}\), \(n\)) |
“stds” |
|
The standard deviations of each Gaussian, of shape (\(n_{pdf}\), \(n\)) |
“weights” |
|
The weight given to each Gaussian, of shape (\(n_{pdf}\), \(n\)) |
Note
Here \(n_{pdf}\) is the number of distributions, and \(n\) is the number of Gaussians for each distribution.
Ensemble Creation
>>> import qp
>>> import numpy as np
>>> means = np.linspace(-1,1,5)
>>> stds = np.linspace(0.1,0.8,5)
>>> weights = np.array([0.1,0.3,0.2,0.2,0.2])
>>> ens = qp.mixmod.create_ensemble(means=means,stds=stds,weights=weights)
>>> ens
Ensemble(the_class=mixmod,shape=(1,5))
Required parameters:
means: The array of means of the component Gaussians, with shape (\(n_{pdf}\),\(n\)).stds: The array of standard deviations of the component Gaussians, with shape (\(n_{pdf}\), \(n\)).weights: The array of weights for each of the component Gaussians. The weights should add to 1, or they will be normalized. They have shape (\(n_{pdf}\), \(n\)).
Optional parameters:
ancil: The dictionary of arrays of additional data containing \(n_{pdf}\) valueswarn: If True, raises warnings if input is not valid PDF data (i.e. if input is not finite). If False, no warnings are raised. By default True.
Conversion
The method used to convert an Ensemble to this parameterization is: extract_mixmod_fit_samples().
Example:
>>> ens_m = qp.convert(ens, 'mixmod', ncomps=2)
>>> ens_m
Ensemble(the_class=mixmod,shape=(1,5))
Optional arguments:
ncomps: The number of component Gaussians to use for all the distributions, by default 3.nsamples: The number of samples to generate from each distribution, by default 1000.random_state: The random state to provide toqp.Ensemble.rvs(), by default None.
This conversion method uses qp.Ensemble.rvs() to sample nsamples data points from each of the input distributions. Then it uses sklearn.mixture.GaussianMixture.fit to estimate the parameters of a Gaussian mixture model distribution for each of the distributions.
Known issues
Currently the rvs() method of the Gaussian mixed model parameterization is not functional. This also means that converting Gaussian mixed model Ensembles to other types of Ensembles via conversion methods that use sampling will not work (i.e. converting to a histogram via the ‘samples’ method).