Quantile
Quantile distributions are parameterized by values from the CDF. They have:
quantiles (
quants): \(n\) ordered quantiles of the distribution, from 0 to 1. These are evenly spaced cumulative probabilities (i.e. the probability \(x \leq\) some value.)locations (
locs): The \(n\) locations (\(x\) values), on the distribution’s CDF where the quantiles are reached.
Use cases
The quantile parameterization works well for data that has a well-behaved CDF. As well, it is easier to represent distributions that may be spread out in the \(x\) coordinate space with this parameterization than with interpolation or histogram parameterizations, as all distributions will fall into the same range of quantiles.
One thing to note when using this parameterization is that it does not require that the PDF be positive. So the interpolation of the PDFs can become negative, particularly in areas where the CDF is flat or close to flat, which may not be desirable in certain use cases.
Behaviour
Quantile parameterized Ensembles behave in the following ways:
Ensemble.cdf(x)is created by interpolating quadratically between the quantiles usingscipy.interpolate.interp1d.Ensemble.ppf(0)returns negative infinity andEnsemble.ppf(1)returns positive infinity.Ensemble.pdf(x)is calculated in a variety of ways depending on the PDF constructor used (pdf_constructor_name, described below).piecewise_linear (Default): Takes the numerical derivative of the CDF and linearly interpolates between those points. See
PiecewiseLinearfor more details.piecewise_constant: Calculates the numerical derivative of the CDF. Assumes a constant value between points on the derivative to interpolate. See
PiecewiseConstantfor more details.cdf_spline_derivative: Uses
scipy.interpolate.InterpolatedUnivariateSplineto fit a cubic spline to quantiles and locations, and then gets the derivative of that spline which provides the PDF values. SeeCdfSplineDerivativefor more details.dual_spline_average: Solves for the PDF with a stepwise algorithm, then uses these values to create an upper bound and lower bound cubic spline of the PDF, which are then averaged to produce the PDF. See
DualSplineAveragefor more details.
Ensemble.x_samples()returns a range of \(x\) values that can be used to plot all of the distributions. The range is calculated usingnumpy.linspace, with a step size that is the median of the existing step sizes between the locations in the distributions, unless this gives more than 10 000 points, in which case the step size that returns 10 000 points is used.
Data structure
See Data Structure for general details on the data structure of Ensembles.
Metadata Dictionary
Key |
Example value |
Description |
|---|---|---|
“pdf_name” |
|
The parameterization type |
“pdf_version” |
|
Version of parameterization type used |
“pdf_constructor_name” |
|
Version of the PDF constructor algorithm used. |
“ensure_extent” |
|
If the extent of the quantiles was forced to be from [0,1]. |
“quants” |
|
The \(n\) quantiles shared across all distributions. |
Data Dictionary
Key |
Example value |
Description |
|---|---|---|
“locs” |
|
The values corresponding to each quantile, of shape (\(n_{pdf}\), \(n\)) |
Note
\(n_{pdf}\) is the number of distributions in an Ensemble.
Ensemble creation
>>> import qp
>>> import numpy as np
>>> quants = np.linspace(0,1,5)
>>> locs = np.array([np.linspace(1,3,5),np.linspace(5,7,5)])
>>> ens = qp.quant.create_ensemble(quants=quants,locs=locs)
>>> ens
Ensemble(the_class=quant,shape=(2,5))
Required parameters:
quants: The array containing the \(n\) quantiles to use for each distribution.locs: The array containing the (\(n_{pdf}\), \(n\)) \(x\) values or coordinates where the quantiles are reached.
Optional parameters:
ancil: The dictionary of arrays of additional data containing \(n_{pdf}\) valuespdf_constructor_name: The construction algorithm used to create the PDF, by default “piecewise_linear”. The options are:“piecewise_linear”
“piecewise_constant”
“cdf_spline_derivative”
“dual_spline_average”
ensure_extent: If True, ensures that the quants start at 0 and end at 1 by linearly interpolating data points at the edges of the given data as necessary until the quants extend from 0 to 1. By default True.warn: If True, raises warnings if input is not valid PDF data (i.e. if data is negative). If False, no warnings are raised. By default True.
For more details on creating an Ensemble, see Creating an Ensemble, and for more details on this function see its API documentation.
Conversion
The method used to convert an Ensemble to this parameterization is: extract_quantiles().
Example:
>>> ens_q = qp.convert(ens, 'quant', quants=np.linspace(0.001,0.999,5))
>>> ens_q
Ensemble(the_class=quant,shape=(2,5))
Required argument: quants, the \(n\) quantiles at which to evaluate each distribution.
The conversion function calls the qp.Ensemble.ppf() method of the input Ensemble at the given quantiles, and then uses the returned values with the given quantiles to create a new quantile parameterized Ensemble. This will use all of the defaults for optional parameters.
Warning
We recommend you do not include 0 and 1 in your input quantiles for conversion from most parameterizations. All of the qp exclusive parameterizations return infinite values at ppf(0) and ppf(1), and many of the scipy.stats.rv_continuous distributions do as well (i.e. a normal distribution). Instead, do as in the example above and input quantiles that extend from some value close to 0 to a value close to 1. The parameterization will automatically interpolate the data out to 0 and 1 as explained in Ensemble creation.
Known issues
The interpolated PDF is not constrained to have only positive values, so it may contain negative values. This is particularly likely with the “cdf_spline_derivative” and “dual_spline_average” PDF constructors.