Descriptive statistics

Acknowledging that variables and models are uncertain assumes that we directly or indirectly can describe them through probability distributions. However for most applications the distribution is a messy entity that on its own is hard to interpret directly. So instead, we use statistical metrics designed to summarize distribution and to get an intuitive understanding of its statistical properties.

In addition, for each statistical property, there almost always exists an empirical counterpart that works as a best estimate of said statistical property in the scenarios where only data is available. This is important, as Monte Carlo integration isn’t possible without the empirical metrics used to describe the results.

This section takes a look at some popular statistical metrics and compares them to their empirical counterparts.

Expected value

Take for example the most common metric, the expected value function chaospy.E(). This operator works on any distribution:

[1]:
import chaospy

uniform = chaospy.Uniform(0, 4)
chaospy.E(uniform)
[1]:
array(2.)

Its empirical counterpart is the mean function: \(\bar X=\tfrac 1N \sum X_i\). This function is available as numpy.mean and can be used on samples generated from said distribution:

[2]:
samples = uniform.sample(1e7)
numpy.mean(samples)
[2]:
1.9997276896715797

The operator can also be used on any polynomial, but would then require the distribution of interest as a second argument:

[3]:
q0 = chaospy.variable()

chaospy.E(q0**3-1, uniform)
[3]:
array(15.)

In the multivariate case, the distribution and the polynomials needs to coincide politically. E.g.

[4]:
q0, q1, q2 = chaospy.variable(3)
joint3 = chaospy.J(chaospy.Normal(0, 1), chaospy.Uniform(0, 2), chaospy.Normal(2, 2))

chaospy.E([q0, q1*q2], joint3)
[4]:
array([0., 2.])

Here q0, q1 and q2 correspond to chaospy.Normal(0, 1), chaospy.Uniform(0, 2) and chaospy.Normal(2, 2) respectively. It is the variable name position and distribution length that matters here, not the shape of what is being taken the expected value of.

Note also that the model approximations created by e.g. chaospy.fit_regression() and chaospy.fit_quadrature() also are valid polynomials.

Higher order moments

In addition to the expected value there is also higher order statistics that work in the same way. They are with their numpy and scipy empirical counterparts:

Name

chaospy

numpy or scipy

Variance

chaospy.Var()

numpy.var

Standard deviation

chaospy.Std()

numpy.std

Covariance

chaospy.Cov()

numpy.cov

Correlation

chaospy.Corr()

numpy.corrcoef

Skewness

chaospy.Skew()

scipy.stats.skew

Kurtosis

chaospy.Kurt()

scipy.stats.kurtosis

For example (Pearson’s) correlation:

[5]:
chaospy.Corr([q0, q0*q2], joint3)
[5]:
array([[1.        , 0.70710678],
       [0.70710678, 1.        ]])

Conditional mean

The conditional expected value chaospy.E_cond() is similar to the more conventional chaospy.E(), but differs in that it supports partial conditioning. In other words it is possible to “freeze” some of the variables and only evaluate the others. For example:

[6]:
chaospy.E_cond([q0, q1*q2], q0, joint3)
[6]:
polynomial([q0, 2.0])
[7]:
chaospy.E_cond([q0, q1*q2], q1, joint3)
[7]:
polynomial([0.0, 2.0*q1])
[8]:
chaospy.E_cond([q0, q1*q2], [q1, q2], joint3)
[8]:
polynomial([0.0, q1*q2])

Sensitivity analysis

Variance-based sensitivity analysis (often referred to as the Sobol method or Sobol indices) is a form of global sensitivity analysis. Working within a probabilistic framework, it decomposes the variance of the output of the model or system into fractions which can be attributed to inputs or sets of inputs. Read more in for example Wikipedia.

In chaospy, the three functions are available:

Name

chaospy function

  1. order main

chaospy.Sens_m

  1. order main

chaospy.Sens_m2

total order

chaospy.Sens_m2

For example:

[9]:
chaospy.Sens_m(6*q0+3*q1+q2, joint3)
[9]:
array([0.8372093 , 0.06976744, 0.09302326])
[10]:
chaospy.Sens_m2(q0*q1+q1*q2, joint3)
[10]:
array([[0.        , 0.04166667, 0.        ],
       [0.04166667, 0.        , 0.16666667],
       [0.        , 0.16666667, 0.        ]])
[11]:
chaospy.Sens_t(6*q0+3*q1+q2, joint3)
[11]:
array([0.8372093 , 0.06976744, 0.09302326])

There are no direct empirical counterparts to these functions, but it is possible to create schemes using for example Saltelli’s method.

Percentile

Calculating a closed form percentile of a multivariate polynomial is not feasible. As such, chaospy does not calculate it. However, as a matter of convenience, a simple function wrapper chaospy.Perc() that calculate said values using Monte Carlo integration is provided. For example:

[12]:
chaospy.Perc([q0, q1*q2], [25, 50, 75], joint3, sample=1000, seed=1234)
[12]:
array([[-0.68697501,  0.21743063],
       [ 0.0506567 ,  1.38448851],
       [ 0.75070447,  3.40628725]])

Note that the accuracy of this method is dependent on the number of samples.

Quantity of interest

If you want to interpret the model approximation as a distribution for further second order analysis, this is possible through the chaospy.QoI_Dist. This is a thin wrapper function that generates samples and pass them to the kernel density estimation class chaospy.GaussianKDE(). It works as follows:

[13]:
new_dist = chaospy.QoI_Dist(q0*q1+q2, joint3)
new_dist.sample(6, seed=1234).round(6)
[13]:
array([-0.006853,  2.69112 ,  1.633062,  3.810936,  3.766989,  0.614768])