Kernel density estimation

Kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a data sett. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. So to make a KDE first we need som data. For example:

[1]:
import numpy
numpy.random.seed(1000)
samples = numpy.random.random((2, 7))

samples.round(4)
[1]:
array([[0.6536, 0.115 , 0.9503, 0.4822, 0.8725, 0.2123, 0.0407],
       [0.3972, 0.2331, 0.8417, 0.2071, 0.7425, 0.3922, 0.1823]])

In chaospy a KDE distribution can be created as follows:

[2]:
import chaospy
distribution = chaospy.GaussianKDE(samples, h_mat=0.05)
distribution
[2]:
GaussianKDE()
[3]:
from matplotlib import pyplot
pyplot.rc("figure", figsize=[15, 6], dpi=75)

xloc, yloc = numpy.mgrid[0:1:50j, 0:1:50j]
density = distribution.pdf([xloc, yloc])
pyplot.contourf(xloc, yloc, density)
pyplot.scatter(*samples)

pyplot.show()
../../_images/user_guide_advanced_topics_kernel_density_estimation_4_0.png

Data driven chaos expansion

With the KDE created, it is possible to create orthogonal polynomial chaos expansions directly:

[4]:
distribution = chaospy.GaussianKDE(samples)
expansion = chaospy.generate_expansion(1, distribution, rule="cholesky")

expansion.round(4)
[4]:
polynomial([1.0, q1-0.428, -0.7841*q1+q0-0.1396])

The distribution also supports generation of samples. For example:

[5]:
pseudo_samples = distribution.sample(500, rule="halton")

pyplot.scatter(*pseudo_samples)
pyplot.show()
../../_images/user_guide_advanced_topics_kernel_density_estimation_8_0.png

This can then be used to create a model approximation using Point Collocation Method.

Smoothing/Bandwidth

In one dimension the kernel density is determined by a bandwidth parameter. As kernel density estimation in chaospy is multivariate, the bandwidth parameter is instead defined through an H-matrix. However, it is possible to provide bandwidth as well, by just squaring the value. For example:

[6]:
samples_1d = chaospy.Normal().sample(100, rule="halton")

pyplot.hist(samples_1d, bins=50, density=True, alpha=0.5)

t = numpy.linspace(-3, 3, 400)
distribution = chaospy.GaussianKDE(samples_1d, h_mat=0.05**2)
pyplot.plot(t, distribution.pdf(t), label="0.05")

distribution = chaospy.GaussianKDE(samples_1d, h_mat=0.2**2)
pyplot.plot(t, distribution.pdf(t), label="0.2")

distribution = chaospy.GaussianKDE(samples_1d, h_mat=2**2)
pyplot.plot(t, distribution.pdf(t), label="2.0")

pyplot.legend()
pyplot.show()
../../_images/user_guide_advanced_topics_kernel_density_estimation_11_0.png

It is possible to provide the bandwidth in the multivariate case as well. If it is provided as a scalar, it will assume that all dimensions are handled equally (as in no anisotropi) and no internal correlation for each sample.

[7]:
distribution = chaospy.GaussianKDE(samples, h_mat=0.1**2)
pyplot.contourf(xloc, yloc, distribution.pdf([xloc, yloc]))
pyplot.scatter(*samples)
pyplot.show()
../../_images/user_guide_advanced_topics_kernel_density_estimation_13_0.png

Anisotropic smoothing

In the multivariate case, it is also possible to provide different smoothing parameters for the different dimensions, creating an anisotropic effect:

[8]:
pyplot.subplot(121)
distribution = chaospy.GaussianKDE(samples, h_mat=[0.05, 0.001])
pyplot.contourf(xloc, yloc, distribution.pdf([xloc, yloc]))
pyplot.scatter(*samples)

pyplot.subplot(122)
distribution = chaospy.GaussianKDE(samples, h_mat=[0.001, 0.05])
pyplot.contourf(xloc, yloc, distribution.pdf([xloc, yloc]))
pyplot.scatter(*samples)

pyplot.show()
../../_images/user_guide_advanced_topics_kernel_density_estimation_15_0.png

Full kernel covariance

It is also possible to define h_mat to be a full correlation matrix applied to each sample. For example:

[9]:
pyplot.subplot(121)
distribution = chaospy.GaussianKDE(
    samples, h_mat=[[0.1, 0.099], [0.099, 0.1]])
pyplot.contourf(xloc, yloc, distribution.pdf([xloc, yloc]))
pyplot.scatter(*samples)

pyplot.subplot(122)
distribution = chaospy.GaussianKDE(
    samples, h_mat=[[0.1, -0.099], [-0.099, 0.1]])
pyplot.contourf(xloc, yloc, distribution.pdf([xloc, yloc]))
pyplot.scatter(*samples)

pyplot.show()
../../_images/user_guide_advanced_topics_kernel_density_estimation_17_0.png