Gaussian mixture model

Gaussian mixture model (GMM) is a probabilistic model created by averaging multiple Gaussian density functions. It is not uncommon to think of these models as a clustering technique because when a model is fitted, it can be used to backtrack which individual density each samples is created from. However, in chaospy, which first and foremost deals with forward problems, sees GMM as a very flexible class of distributions.

On the most basic level constructing GMM in chaospy can be done from a sequence of means and covariances:

[1]:
import chaospy

means = ([0, 1], [1, 1], [1, 0])
covariances = ([[1.0, -0.9], [-0.9, 1.0]],
               [[1.0,  0.9], [ 0.9, 1.0]],
               [[0.1,  0.0], [ 0.0, 0.1]])
distribution = chaospy.GaussianMixture(means, covariances)
distribution
[1]:
GaussianMixture()
[2]:
import numpy
from matplotlib import pyplot
pyplot.rc("figure", figsize=[15, 6], dpi=75)

xloc, yloc = numpy.mgrid[-2:3:100j, -1:3:100j]
density = distribution.pdf([xloc, yloc])
pyplot.contourf(xloc, yloc, density)

pyplot.show()
../../_images/user_guide_advanced_topics_gaussian_mixture_model_2_0.png

Model fitting

chaospy supports Gaussian mixture model representation, but does not provide an automatic method for constructing them from data. However, this is something for example scikit-learn supports. It is possible to use scikit-learn to fit a model, and use the generated parameters in the chaospy implementation. For example, let us consider the Iris example from scikit-learn’s documentation (“full” implementation in 2-dimensional representation):

[3]:
from sklearn import datasets, mixture

model = mixture.GaussianMixture(3, random_state=1234)
model.fit(datasets.load_iris().data)

means = model.means_[:, :2]
covariances = model.covariances_[:, :2, :2]
print(means.round(4))
print(covariances.round(4))
[[5.006  3.428 ]
 [6.5464 2.9495]
 [5.9171 2.778 ]]
[[[0.1218 0.0972]
  [0.0972 0.1408]]

 [[0.3874 0.0922]
  [0.0922 0.1104]]

 [[0.2755 0.0966]
  [0.0966 0.0926]]]
[4]:
distribution = chaospy.GaussianMixture(means, covariances)

xloc, yloc = numpy.mgrid[4:8:100j, 1.5:4.5:100j]
density = distribution.pdf([xloc, yloc])
pyplot.contourf(xloc, yloc, density)

pyplot.show()
../../_images/user_guide_advanced_topics_gaussian_mixture_model_5_0.png

Like scikit-learn, chaospy also support higher dimensions, but that would make the visualization harder.

Low discrepancy sequences

chaospy support low-discrepancy sequences through inverse mapping. This support extends to mixture models, making the following possible:

[5]:
pseudo_samples = distribution.sample(500, rule="additive_recursion")

pyplot.scatter(*pseudo_samples)
pyplot.show()
../../_images/user_guide_advanced_topics_gaussian_mixture_model_8_0.png

Chaos expansion

To be able to do point collocation method it requires the user to have access to sampler from the input distribution and orthogonal polynomials with respect to the input distribution. The former is available above, while the latter is available as follows:

[6]:
expansion = chaospy.generate_expansion(1, distribution, rule="cholesky")

expansion.round(4)
[6]:
polynomial([1.0, q1-3.0518, 0.2121*q1+q0-6.4705])