API Documentation for qp¶
qp provides a PDF class object, that builds on the scipy.stats distributions to provide various approximate forms. The package also contains some utils and metrics for quantifying the quality of these approximations.
Ensemble and Factory¶
Implemenation of an ensemble of distributions
- class qp.ensemble.Ensemble(gen_func, data, ancil=None)[source]¶
An object comprised of many qp.PDF objects to efficiently perform operations on all of them
- property gen_func¶
Return the function used to create the distribution object for this ensemble
- property gen_class¶
Return the class used to generate distributions for this ensemble
- property dist¶
Return the scipy.stats.rv_continuous object that generates distributions for this ensemble
- property kwds¶
Return the kwds associated to the frozen object
- property gen_obj¶
Return the scipy.stats.rv_continuous object that generates distributions for this ensemble
- property frozen¶
Return the scipy.stats.rv_frozen object that encapsultes the distributions for this ensemble
- property ndim¶
Return the number of dimensions of PDFs in this ensemble
- property shape¶
Return the number of PDFs in this ensemble
- property npdf¶
Return the number of PDFs in this ensemble
- property ancil¶
Return the ancillary data dictionary
- convert_to(to_class, **kwargs)[source]¶
Convert a distribution or ensemble
- Parameters:
to_class (class) – Class to convert to
**kwargs – keyword arguments are passed to the output class constructor
method (str) – Optional argument to specify a non-default conversion algorithm
- Returns:
ens – Ensemble of pdfs yype class_to using the data from this object
- Return type:
qp.Ensemble
- update(data, ancil=None)[source]¶
Update the frozen object
- Parameters:
data (dict) – Dictionary with data used to construct the ensemble
- update_objdata(data, ancil=None)[source]¶
Update the object data in the distribution
- Parameters:
data (dict) – Dictionary with data used to construct the ensemble
- metadata()[source]¶
Return the metadata for this ensemble
- Returns:
metadata – The metadata
- Return type:
dict
Notes
Metadata are elements that are the same for all the PDFs in the ensemble These include the name and version of the PDF generation class
- objdata()[source]¶
Return the object data for this ensemble
- Returns:
objdata – The object data
- Return type:
dict
Notes
Object data are elements that differ for each PDFs in the ensemble
- set_ancil(ancil)[source]¶
Set the ancillary data dict
- Parameters:
ancil (dict) – The ancillary data dictionary
Notes
Raises IndexError if the length of the arrays in ancil does not match the number of PDFs in the Ensemble
- add_to_ancil(to_add)[source]¶
Add additionaly columns to the ancillary data dict
- Parameters:
to_add (dict) – The columns to add to the ancillary data dict
Notes
Raises IndexError if the length of the arrays in to_add does not match the number of PDFs in the Ensemble
This calls dict.update() so it will overwrite existing columns
- append(other_ens)[source]¶
Append another other_ens to this one
- Parameters:
other_ens (qp.Ensemble) – The other Ensemble
- build_tables()[source]¶
Return dicts of numpy arrays for the meta data and object data for this ensemble
- Returns:
meta (dict) – Table with the meta data
data (dict) – Table with the object data
- mode(grid)[source]¶
return the mode of each ensemble PDF, evaluated on grid
- Parameters:
new_grid (array-like) – Grid on which to evaluate PDF
- Returns:
mode – The modes of the PDFs evaluated on new_grid
- Return type:
array-like
Notes
Adding expand_dims to return an (N, 1) array to be consistent with mean, median, and other point estimates
- gridded(grid)[source]¶
Build, cache are return the PDF values at grid points
- Parameters:
grid (array-like) – The grid points
- Returns:
gridded
- Return type:
(grid, pdf_values)
Notes
This first comparse grid to the cached value, if they match it returns the cached value
- write_to(filename)[source]¶
Save this ensemble to a file
- Parameters:
filename (str) –
Notes
This will actually write two files, one for the metadata and one for the object data
This uses tables_io to write the data, so any filesuffix that works for tables_io will work here.
- pdf(x)[source]¶
Evaluates the probablity density function for the whole ensemble
- Parameters:
x (float or ndarray, float) – location(s) at which to do the evaluations
- logpdf(x)[source]¶
Evaluates the log of the probablity density function for the whole ensemble
- Parameters:
x (float or ndarray, float) – location(s) at which to do the evaluations
- cdf(x)[source]¶
Evaluates the cumalative distribution function for the whole ensemble
- Parameters:
x (float or ndarray, float) – location(s) at which to do the evaluations
- logcdf(x)[source]¶
Evaluates the log of the cumalative distribution function for the whole ensemble
- Parameters:
x (float or ndarray, float) – location(s) at which to do the evaluations
- ppf(q)[source]¶
Evaluates all the PPF of the distribution
- Parameters:
q (float or ndarray, float) – location(s) at which to do the evaluations
- sf(q)[source]¶
Evaluates the survival fraction of the distribution
- Parameters:
x (float or ndarray, float) –
at which to evaluate the pdfs
- logsf(q)[source]¶
Evaluates the log of the survival function of the distribution
- Parameters:
q (float or ndarray, float) – location(s) at which to evaluate the pdfs
- Returns:
Log of the survival function
- Return type:
float or ndarray
- isf(q)[source]¶
Evaluates the inverse of the survival fraction of the distribution
- Parameters:
x (float or ndarray, float) –
at which to evaluate the pdfs
- rvs(size=None, random_state=None)[source]¶
Generate samples from this ensmeble
- Parameters:
size (int) – number of samples to return
- stats(moments='mv')[source]¶
Retrun the stats for this ensemble
- Parameters:
moments (str) – Which moments to include
- interval(alpha)[source]¶
Return the intervals corresponding to a confidnce level of alpha for this ensemble
- histogramize(bins)[source]¶
Computes integrated histogram bin values for all PDFs
- Parameters:
bins (ndarray, float, optional) – Array of N+1 endpoints of N bins
- Returns:
self.histogram – Array of pairs of arrays of lengths (N+1, N) containing endpoints of bins and values in bins
- Return type:
ndarray, tuple, ndarray, floats
- integrate(limits)[source]¶
Computes the integral under the ensemble of PDFs between the given limits.
- Parameters:
limits (numpy.ndarray, tuple, float) – limits of integration, may be different for all PDFs in the ensemble
using (string) – parametrization over which to approximate the integral
dx (float, optional) – granularity of integral
- Returns:
integral – value of the integral
- Return type:
numpy.ndarray, float
- mix_mod_fit(comps=5)[source]¶
Fits the parameters of a given functional form to an approximation
- Parameters:
comps (int, optional) – number of components to consider
using (string, optional) – which existing approximation to use, defaults to first approximation
vb (boolean) – Report progress
- Returns:
self.mix_mod – list of qp.Composite objects approximating the PDFs
- Return type:
list, qp.Composite objects
Notes
Currently only supports mixture of Gaussians
- plot(key=0, **kwargs)[source]¶
Plot the pdf as a curve
- Parameters:
key (int or slice) – Which PDF or PDFs from this ensemble to plot
- plot_native(key=0, **kwargs)[source]¶
Plot the pdf as a curve
- Parameters:
key (int or slice) – Which PDF or PDFs from this ensemble to plot
- initializeHdf5Write(filename, npdf, comm=None)[source]¶
set up the output write for an ensemble, but set size to npdf rather than the size of the ensemble, as the “initial chunk” will not contain the full data
- Parameters:
filename (str) – Name of the file to create
npdf (int) – Total number of pdfs that will contain the file, usually larger then the size of the current ensemble
comm (MPI communicator) – Optional MPI communicator to allow parallel writing
This module implements a factory that manages different types of PDFs
- class qp.factory.Factory[source]¶
Factory that creates and manages PDFs
- add_class(the_class)[source]¶
Add a class to the factory
- Parameters:
the_class (class) – The class we are adding, must inherit from Pdf_Gen
- create(class_name, data, method=None)[source]¶
Make an ensemble of a particular type of distribution
- Parameters:
class_name (str) – The name of the class to make
data (dict) – Values passed to class create function
method (str [None]) – Used to select which creation method to invoke
- Returns:
ens – The newly created ensemble
- Return type:
qp.Ensemble
- from_tables(tables)[source]¶
Build this ensemble from a tables
- Parameters:
tables (dict) –
Notes
This will use information in the meta data table to figure out how to construct the data need to build the ensemble.
- read_metadata(filename)[source]¶
Read an ensemble’s metadata from a file, without loading the full data.
- Parameters:
filename (str) –
- is_qp_file(filename)[source]¶
Test if a file is a qp file
- Parameters:
filename (str) – File to test
- Returns:
value – True if the file is a qp file
- Return type:
bool
- read(filename)[source]¶
Read this ensemble from a file
- Parameters:
filename (str) –
Notes
This will use information in the meta data to figure out how to construct the data need to build the ensemble.
- data_length(filename)[source]¶
Get the size of data
- Parameters:
filename (str) –
- Returns:
nrows
- Return type:
int
- iterator(filename, chunk_size=100000, rank=0, parallel_size=1)[source]¶
Return an iterator for chunked read
- Parameters:
filename (str) –
chunk_size (int) –
- convert(in_dist, class_name, **kwds)[source]¶
Read an ensemble to a different repersenation
- Parameters:
in_dist (qp.Ensemble) – Input distributions
class_name (str) – Representation to convert to
- Returns:
ens – The ensemble we converted to
- Return type:
qp.Ensemble
- pretty_print(stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Print a level of the converstion dictionary in a human-readable format
- Parameters:
stream (stream) – The stream to print to
- qp.factory.add_class(the_class)¶
Add a class to the factory
- Parameters:
the_class (class) – The class we are adding, must inherit from Pdf_Gen
- qp.factory.create(class_name, data, method=None)¶
Make an ensemble of a particular type of distribution
- Parameters:
class_name (str) – The name of the class to make
data (dict) – Values passed to class create function
method (str [None]) – Used to select which creation method to invoke
- Returns:
ens – The newly created ensemble
- Return type:
qp.Ensemble
- qp.factory.read(filename)¶
Read this ensemble from a file
- Parameters:
filename (str) –
Notes
This will use information in the meta data to figure out how to construct the data need to build the ensemble.
- qp.factory.read_metadata(filename)¶
Read an ensemble’s metadata from a file, without loading the full data.
- Parameters:
filename (str) –
- qp.factory.iterator(filename, chunk_size=100000, rank=0, parallel_size=1)¶
Return an iterator for chunked read
- Parameters:
filename (str) –
chunk_size (int) –
- qp.factory.convert(in_dist, class_name, **kwds)¶
Read an ensemble to a different repersenation
- Parameters:
in_dist (qp.Ensemble) – Input distributions
class_name (str) – Representation to convert to
- Returns:
ens – The ensemble we converted to
- Return type:
qp.Ensemble
- qp.factory.concatenate(ensembles)¶
Concatanate a list of ensembles
- Parameters:
ensembles (list) – The ensembles we are concatanating
- Returns:
ens – The output
- Return type:
qp.Ensemble
- qp.factory.data_length(filename)¶
Get the size of data
- Parameters:
filename (str) –
- Returns:
nrows
- Return type:
int
- qp.factory.from_tables(tables)¶
Build this ensemble from a tables
- Parameters:
tables (dict) –
Notes
This will use information in the meta data table to figure out how to construct the data need to build the ensemble.
- qp.factory.is_qp_file(filename)¶
Test if a file is a qp file
- Parameters:
filename (str) – File to test
- Returns:
value – True if the file is a qp file
- Return type:
bool
- qp.factory.write_dict(filename, ensemble_dict, **kwargs)¶
- qp.factory.read_dict(filename)¶
Assume that filename is an HDF5 file, containing multiple qp.Ensembles that have been stored at nparrays.
Distribution types¶
Histogram based¶
- class qp.hist_gen(bins, pdfs, *args, **kwargs)[source]¶
Bases:
Pdf_rows_gen
Histogram based distribution
Notes
This implements a PDF using a set of histogramed values.
The relevant data members are:
bins: n+1 bin edges (shared for all PDFs)
pdfs: (npdf, n) bin values
Inside a given bin the pdf() will return the pdf value. Outside the range bins[0], bins[-1] the pdf() will return 0.
Inside a given bin the cdf() will use a linear interpolation accross the bin Outside the range bins[0], bins[-1] the cdf() will return (0 or 1), respectively
The ppf() is computed by inverting the cdf(). ppf(0) will return bins[0] ppf(1) will return bins[-1]
- name = 'hist'¶
- version = 0¶
- property bins¶
Return the histogram bin edges
- property pdfs¶
Return the histogram bin values
- classmethod get_allocation_kwds(npdf, **kwargs)[source]¶
Return kwds necessary to create ‘empty’ hdf5 file with npdf entries for iterative writeout
Interpolation of a fixed grid¶
- class qp.interp_gen(xvals, yvals, *args, **kwargs)[source]¶
Bases:
Pdf_rows_gen
Interpolator based distribution
Notes
This implements a PDF using a set of interpolated values.
This version use the same xvals for all the the PDFs, which allows for much faster evaluation, and reduces the memory usage by a factor of 2.
The relevant data members are:
xvals: (n) x values
yvals: (npdf, n) y values
Inside the range xvals[0], xvals[-1] tt simply takes a set of x and y values and uses scipy.interpolate.interp1d to build the PDF. Outside the range xvals[0], xvals[-1] the pdf() will return 0.
The cdf() is constructed by integrating analytically computing the cumulative sum at the xvals grid points and interpolating between them. This will give a slight discrepency with the true integral of the pdf(), bit is much, much faster to evaluate. Outside the range xvals[0], xvals[-1] the cdf() will return (0 or 1), respectively
The ppf() is computed by inverting the cdf(). ppf(0) will return xvals[0] ppf(1) will return xvals[-1]
- name = 'interp'¶
- version = 0¶
- property xvals¶
Return the x-values used to do the interpolation
- property yvals¶
Return the y-valus used to do the interpolation
- classmethod get_allocation_kwds(npdf, **kwargs)[source]¶
Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.
- Parameters:
npdf (int) – number of total PDFs that will be written out
kwargs (dict) – dictionary of kwargs needed to create the ensemble
Interpolation of a non-fixed grid¶
- class qp.interp_irregular_gen(xvals, yvals, *args, **kwargs)[source]¶
Bases:
Pdf_rows_gen
Interpolator based distribution
Notes
This implements a PDF using a set of interpolated values.
This version use the different xvals for each the the PDFs, which allows for more precision.
The relevant data members are:
xvals: (npdf, n) x values
yvals: (npdf, n) y values
Inside the range xvals[:,0], xvals[:,-1] tt simply takes a set of x and y values and uses scipy.interpolate.interp1d to build the PDF. Outside the range xvals[:,0], xvals[:,-1] the pdf() will return 0.
The cdf() is constructed by integrating analytically computing the cumulative sum at the xvals grid points and interpolating between them. This will give a slight discrepency with the true integral of the pdf(), bit is much, much faster to evaluate. Outside the range xvals[:,0], xvals[:,-1] the cdf() will return (0 or 1), respectively
The ppf() is computed by inverting the cdf(). ppf(0) will return min(xvals) ppf(1) will return max(xvals)
- name = 'interp_irregular'¶
- version = 0¶
- property xvals¶
Return the x-values used to do the interpolation
- property yvals¶
Return the y-valus used to do the interpolation
- classmethod get_allocation_kwds(npdf, **kwargs)[source]¶
Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.
- Parameters:
npdf (int) – number of total PDFs that will be written out
kwargs (dict) – dictionary of kwargs needed to create the ensemble
Spline based¶
- class qp.spline_gen(*args, **kwargs)[source]¶
Bases:
Pdf_rows_gen
Spline based distribution
Notes
This implements PDFs using a set of splines
The relevant data members are:
splx: (npdf, n) spline-knot x-values
sply: (npdf, n) spline-knot y-values
spln: (npdf) spline-knot order paramters
The pdf() for the ith pdf will return the result of scipy.interpolate.splev(x, splx[i], sply[i], spln[i))
The cdf() for the ith pdf will return the result of scipy.interpolate.splint(x, splx[i], sply[i], spln[i))
The ppf() will use the default scipy implementation, which inverts the cdf() as evaluated on an adaptive grid.
- name = 'spline'¶
- version = 0¶
- static build_normed_splines(xvals, yvals, **kwargs)[source]¶
Build a set of normalized splines using the x and y values
- Parameters:
xvals (array_like) – The x-values used to do the interpolation
yvals (array_like) – The y-values used to do the interpolation
- Returns:
splx (array_like) – The x-values of the spline knots
sply (array_like) – The y-values of the spline knots
spln (array_like) – The order of the spline knots
- classmethod create_from_xy_vals(xvals, yvals, **kwargs)[source]¶
Create a new distribution using the given x and y values
- Parameters:
xvals (array_like) – The x-values used to do the interpolation
yvals (array_like) – The y-values used to do the interpolation
- Returns:
pdf_obj – The requested PDF
- Return type:
spline_gen
- classmethod create_from_samples(xvals, samples, **kwargs)[source]¶
Create a new distribution using the given x and y values
- Parameters:
xvals (array_like) – The x-values used to do the interpolation
samples (array_like) – The sample values used to build the KDE
- Returns:
pdf_obj – The requested PDF
- Return type:
spline_gen
- property splx¶
Return x-values of the spline knots
- property sply¶
Return y-values of the spline knots
- property spln¶
Return order of the spline knots
- classmethod get_allocation_kwds(npdf, **kwargs)[source]¶
Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.
- Parameters:
npdf (int) – number of total PDFs that will be written out
kwargs (dict) – dictionary of kwargs needed to create the ensemble
Quantile based¶
- class qp.quant_gen(quants, locs, *args, **kwargs)[source]¶
Bases:
Pdf_rows_gen
Quantile based distribution, where the PDF is defined piecewise from the quantiles
Notes
This implements a CDF by interpolating a set of quantile values
It simply takes a set of x and y values and uses scipy.interpolate.interp1d to build the CDF
- name = 'quant'¶
- version = 0¶
- property quants¶
Return quantiles used to build the CDF
- property locs¶
Return the locations at which those quantiles are reached
- property pdf_constructor_name¶
Returns the name of the current pdf constructor. Matches a key in the PDF_CONSTRUCTORS dictionary.
- property pdf_constructor: AbstractQuantilePdfConstructor¶
Returns the current PDF constructor, and allows the user to interact with its methods.
- Returns:
Abstract base class of the active concrete PDF constructor.
- Return type:
AbstractQuantilePdfConstructor
- classmethod get_allocation_kwds(npdf, **kwargs)[source]¶
Return kwds necessary to create ‘empty’ hdf5 file with npdf entries for iterative writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.
Gaussian mixture model based¶
- class qp.mixmod_gen(means, stds, weights, *args, **kwargs)[source]¶
Bases:
Pdf_rows_gen
Mixture model based distribution
Notes
This implements a PDF using a Gaussian Mixture model
The relevant data members are:
means: (npdf, ncomp) means of the Gaussians stds: (npdf, ncomp) standard deviations of the Gaussians weights: (npdf, ncomp) weights for the Gaussians
The pdf() and cdf() are exact, and are computed as a weighted sum of the pdf() and cdf() of the component Gaussians.
The ppf() is computed by computing the cdf() values on a fixed grid and interpolating the inverse function.
- name = 'mixmod'¶
- version = 0¶
- property weights¶
Return weights to attach to the Gaussians
- property means¶
Return means of the Gaussians
- property stds¶
Return standard deviations of the Gaussians
- classmethod get_allocation_kwds(npdf, **kwargs)[source]¶
Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.
- Parameters:
npdf (int) – number of total PDFs that will be written out
kwargs (dict) – dictionary of kwargs needed to create the ensemble
scipy distributions¶
Module to define qp distributions that inherit from scipy distributions
Notes
In the qp distribtuions the last axis in the input array shapes is reserved for pdf parameters.
This is because qp deals with numerical representations of distributions, where some of the input parameters consist of arrays of values for each pdf.
scipy.stats assumes that all input parameters scalars for each pdf.
To ensure that scipy.stats based distributions behave the same as qp distributions we are going to insure that the all input variables have shape either (npdf, 1) or (1)
Quantification Metrics¶
This module implements some performance metrics for distribution parameterization
- class qp.metrics.metrics.Grid(grid_values, cardinality, resolution, hist_bin_edges, limits)¶
- cardinality¶
Alias for field number 1
- grid_values¶
Alias for field number 0
- hist_bin_edges¶
Alias for field number 3
- limits¶
Alias for field number 4
- resolution¶
Alias for field number 2
- qp.metrics.metrics.calculate_moment(p, N, limits, dx=0.01)[source]¶
Calculates a moment of a qp.Ensemble object
- Parameters:
p (qp.Ensemble object) – the collection of PDFs whose moment will be calculated
N (int) – order of the moment to be calculated
limits (tuple of floats) – endpoints of integration interval over which to calculate moments
dx (float) – resolution of integration grid
- Returns:
M – value of the moment
- Return type:
float
- qp.metrics.metrics.calculate_kld(p, q, limits, dx=0.01)[source]¶
Calculates the Kullback-Leibler Divergence between two qp.Ensemble objects.
- Parameters:
p (Ensemble object) – probability distribution closer to the truth
q (Ensemble object) – probability distribution that approximates p
limits (tuple of floats) – endpoints of integration interval in which to calculate KLD
dx (float) – resolution of integration grid
- Returns:
Dpq – the value of the Kullback-Leibler Divergence from q to p
- Return type:
float
Notes
TO DO: have this take number of points not dx!
- qp.metrics.metrics.calculate_rmse(p, q, limits, dx=0.01)[source]¶
Calculates the Root Mean Square Error between two qp.Ensemble objects.
- Parameters:
p (qp.Ensemble object) – probability distribution function whose distance between its truth and the approximation of q will be calculated.
q (qp.Ensemble object) – probability distribution function whose distance between its approximation and the truth of p will be calculated.
limits (tuple of floats) – endpoints of integration interval in which to calculate RMS
dx (float) – resolution of integration grid
- Returns:
rms – the value of the RMS error between q and p
- Return type:
float
Notes
TO DO: change dx to N
- qp.metrics.metrics.calculate_rbpe(p, limits=(inf, inf))[source]¶
Calculates the risk based point estimates of a qp.Ensemble object. Algorithm as defined in 4.2 of ‘Photometric redshifts for Hyper Suprime-Cam Subaru Strategic Program Data Release 1’ (Tanaka et al. 2018).
- Parameters:
p (qp.Ensemble object) – Ensemble of PDFs to be evalutated
limits (tuple) – The limits at which to evaluate possible z_best estimates. If custom limits are not provided then all potential z value will be considered using the scipy.optimize.minimize_scalar function.
- Returns:
rbpes – The risk based point estimates of the provided ensemble.
- Return type:
array of floats
- qp.metrics.metrics.calculate_brier(p, truth, limits, dx=0.01)[source]¶
This function will do the following:
Generate a Mx1 sized grid based on limits and dx.
Produce an NxM array by evaluating the pdf for each of the N distribution objects in the Ensemble p on the grid.
Produce an NxM truth_array using the input truth and the generated grid. All values will be 0 or 1.
Create a Brier metric evaluation object
Return the result of the Brier metric calculation.
- Parameters:
p (qp.Ensemble object) – of N distributions probability distribution functions that will be gridded and compared against truth.
truth (Nx1 sequence) – the list of true values, 1 per distribution in p.
limits (2-tuple of floats) – endpoints grid to evaluate the PDFs for the distributions in p
dx (float) – resolution of the grid Defaults to 0.01.
- Returns:
Brier_metric
- Return type:
float
- qp.metrics.metrics.calculate_anderson_darling(p, scipy_distribution='norm', num_samples=100, _random_state=None)[source]¶
This function is deprecated and will be completely removed in a later version. Please use calculate_goodness_of_fit instead.
- Return type:
logger.warning
- qp.metrics.metrics.calculate_cramer_von_mises(p, q, num_samples=100, _random_state=None, **kwargs)[source]¶
This function is deprecated and will be completely removed in a later version. Please use calculate_goodness_of_fit instead.
- Return type:
logger.warning
- qp.metrics.metrics.calculate_kolmogorov_smirnov(p, q, num_samples=100, _random_state=None)[source]¶
This function is deprecated and will be completely removed in a later version. Please use calculate_goodness_of_fit instead.
- Return type:
logger.warning
- qp.metrics.metrics.calculate_outlier_rate(p, lower_limit=0.0001, upper_limit=0.9999)[source]¶
Fraction of outliers in each distribution
- Parameters:
p (qp.Ensemble) – A collection of N distributions. This implementation expects that Ensembles are not nested.
lower_limit (float, optional) – Lower bound CDF for outliers, by default 0.0001
upper_limit (float, optional) – Upper bound CDF for outliers, by default 0.9999
- Returns:
1xN array where each element is the percent of outliers for a distribution in the Ensemble.
- Return type:
[float]
- qp.metrics.metrics.calculate_goodness_of_fit(estimate, reference, fit_metric='ks', num_samples=100, _random_state=None)[source]¶
This method calculates goodness of fit between the distributions in the estimate and reference Ensembles using the specified fit_metric.
- Parameters:
estimate (Ensemble containing N distributions) – Random variate samples will be drawn from this Ensemble
reference (Ensemble containing N or 1 distributions) – The CDF of the distributions in this Ensemble are used in the goodness of fit calculation.
fit_metric (string, optional) – The goodness of fit metric to use. One of [‘ad’, ‘cvm’, ‘ks’]. For clarity, ‘ad’ = Anderson-Darling, ‘cvm’ = Cramer-von Mises, and ‘ks’ = Kolmogorov-Smirnov, by default ‘ks’
num_samples (int, optional) – Number of random variates to draw from each distribution in estimate, by default 100
_random_state (_type_, optional) – Used for testing to create reproducible sets of random variates, by default None
- Returns:
output – A array of floats where each element is the result of the statistic calculation.
- Return type:
[float]
- Raises:
KeyError – If the requested fit_metric is not contained in goodness_of_fit_metrics dictionary, raise a KeyError.
Notes
The calculation of the goodness of fit metrics is not symmetric. i.e. calculate_goodness_of_fit(p, q, …) != calculate_goodness_of_fit(q, p, …)
In the future, we should be able to do this directly from the PDFs without needing to take random variates from the estimate Ensemble.
The vectorized implementations of fit metrics are copied over (unmodified) from the developer branch of Scipy 1.10.0dev. When Scipy 1.10 is released, we can replace the copied implementation with the ones in Scipy.
This module implements metric calculations that are independent of qp.Ensembles
- qp.metrics.array_metrics.quick_anderson_ksamp(p_random_variables, q_random_variables, **kwargs)[source]¶
Calculate the k-sample Anderson-Darling statistic using scipy.stats.anderson_ksamp for two CDFs. For more details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html
- Parameters:
p_random_variables (np.array) – An array of random variables from the given distribution
q_random_variables (np.array) – An array of random variables from the given distribution
- Returns:
A array of objects with attributes
statistic
,critical_values
, andsignificance_level
.- Return type:
[Result objects]
- qp.metrics.array_metrics.quick_kld(p_eval, q_eval, dx=0.01)[source]¶
Calculates the Kullback-Leibler Divergence between two evaluations of PDFs.
- Parameters:
p_eval (numpy.ndarray, float) – evaluations of probability distribution closer to the truth
q_eval (numpy.ndarray, float) – evaluations of probability distribution that approximates p
dx (float) – resolution of integration grid
- Returns:
Dpq – the value of the Kullback-Leibler Divergence from q to p
- Return type:
float
- qp.metrics.array_metrics.quick_moment(p_eval, grid_to_N, dx)[source]¶
Calculates a moment of an evaluated PDF
- Parameters:
p_eval (numpy.ndarray, float) – the values of a probability distribution
grid (numpy.ndarray, float) – the grid upon which p_eval was evaluated
dx (float) – the difference between regular grid points
N (int) – order of the moment to be calculated
- Returns:
M – value of the moment
- Return type:
float
- qp.metrics.array_metrics.quick_rmse(p_eval, q_eval, N)[source]¶
Calculates the Root Mean Square Error between two evaluations of PDFs.
- Parameters:
p_eval (numpy.ndarray, float) – evaluation of probability distribution function whose distance between its truth and the approximation of q will be calculated.
q_eval (numpy.ndarray, float) – evaluation of probability distribution function whose distance between its approximation and the truth of p will be calculated.
N (int) – number of points at which PDFs were evaluated
- Returns:
rms – the value of the RMS error between q and p
- Return type:
float
- qp.metrics.array_metrics.quick_rbpe(pdf_function, integration_bounds, limits=(inf, inf))[source]¶
Calculates the risk based point estimate of a qp.Ensemble object with npdf == 1.
- Parameters:
pdf_function – The function should calculate the value of a pdf at a given x value
function (python) – The function should calculate the value of a pdf at a given x value
integration_bounds – The integration bounds - typically (ppf(0.01), ppf(0.99)) for the given distribution
floats (tuple of) – The integration bounds - typically (ppf(0.01), ppf(0.99)) for the given distribution
limits – The limits at which to evaluate possible z_best estimates. If custom limits are not provided then all potential z value will be considered using the scipy.optimize.minimize_scalar function.
floats – The limits at which to evaluate possible z_best estimates. If custom limits are not provided then all potential z value will be considered using the scipy.optimize.minimize_scalar function.
- Returns:
rbpe – The risk based point estimate of the provided ensemble.
- Return type:
float
- class qp.metrics.brier.Brier(prediction, truth)[source]¶
Brier score based on https://en.wikipedia.org/wiki/Brier_score#Original_definition_by_Brier
- Parameters:
prediction (NxM array, float) – Predicted probability for N distributions to have a true value in one of M bins. The sum of values along each row N should be 1.
truth (NxM array, int) – True values for N distributions, where Mth bin for the true value will have value 1, all other bins will have a value of 0.
- class qp.metrics.pit.PIT(qp_ens, true_vals, eval_grid=DEFAULT_QUANTS)[source]¶
Probability Integral Transform
- Parameters:
qp_ens (Ensemble) – A collection of N distribution objects
true_vals ([float]) – An array-like sequence of N float values representing the known true value for each distribution
eval_grid ([float], optional) – A strictly increasing array-like sequence in the range [0,1], by default DEFAULT_QUANTS
- Returns:
An object with an Ensemble containing the PIT distribution, and a full set of PIT samples.
- Return type:
PIT object
- property pit_samps¶
Returns the PIT samples. i.e.
CDF(true_vals)
for each distribution in the Ensemble used to initialize the PIT object.- Returns:
An array of floats
- Return type:
np.array
- property pit¶
Return the PIT Ensemble object
- Returns:
An Ensemble containing 1 qp.quant distribution.
- Return type:
qp.Ensemble
- calculate_pit_meta_metrics()[source]¶
Convenience method that will calculate all of the PIT meta metrics and return them as a dictionary.
- Returns:
The collection of PIT statistics
- Return type:
dictionary
- evaluate_PIT_anderson_ksamp(pit_min=0.0, pit_max=1.0)[source]¶
Use scipy.stats.anderson_ksamp to compute the Anderson-Darling statistic for the cdf(truth) values by comparing with a uniform distribution between 0 and 1. Up to the current version (1.9.3), scipy.stats.anderson does not support uniform distributions as reference for 1-sample test, therefore we create a uniform “distribution” and pass it as the second value in the list of parameters to the scipy implementation of k-sample Anderson-Darling. For details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html
- Parameters:
pit_min (float, optional) – Minimum PIT value to accept, by default 0.
pit_max (float, optional) – Maximum PIT value to accept, by default 1.
- Returns:
A array of objects with attributes statistic, critical_values, and significance_level. For details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html
- Return type:
array
- evaluate_PIT_CvM()[source]¶
Calculate the Cramer von Mises statistic using scipy.stats.cramervonmises using self._pit_samps compared to a uniform distribution. For more details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.cramervonmises.html
- Returns:
A array of objects with attributes statistic and pvalue For details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.cramervonmises.html
- Return type:
array
- evaluate_PIT_KS()[source]¶
Calculate the Kolmogorov-Smirnov statistic using scipy.stats.kstest. For more details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html
- Returns:
A array of objects with attributes statistic and pvalue. For details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html
- Return type:
array
- evaluate_PIT_outlier_rate(pit_min=0.0001, pit_max=0.9999)[source]¶
Compute fraction of PIT outliers by evaluating the CDF of the distribution in the PIT Ensemble at pit_min and pit_max.
- Parameters:
pit_min (float, optional) – Lower bound for outliers, by default 0.0001
pit_max (float, optional) – Upper bound for outliers, by default 0.9999
- Returns:
The percentage of outliers in this distribution given the min and max bounds.
- Return type:
float
Utility functions¶
qp.conversion_funcs¶
This module implements functions to convert distributions between various representations These functions should then be registered with the qp.ConversionDict using qp_add_mapping. That will allow the automated conversion mechanisms to work.
- qp.conversion_funcs.extract_vals_at_x(in_dist, **kwargs)[source]¶
Convert using a set of x and y values.
- Parameters:
in_dist (qp.Ensemble) – Input distributions
xvals (np.array) – Locations at which the pdf is evaluated
- Returns:
data – The extracted data
- Return type:
dict
- qp.conversion_funcs.extract_xy_vals(in_dist, **kwargs)[source]¶
Convert using a set of x and y values.
- Parameters:
in_dist (qp.Ensemble) – Input distributions
xvals (np.array) – Locations at which the pdf is evaluated
- Returns:
data – The extracted data
- Return type:
dict
- qp.conversion_funcs.extract_samples(in_dist, **kwargs)[source]¶
Convert using a set of values sampled from the PDF
- Parameters:
in_dist (qp.Ensemble) – Input distributions
size (int) – Number of samples to generate
- Returns:
data – The extracted data
- Return type:
dict
- qp.conversion_funcs.extract_hist_values(in_dist, **kwargs)[source]¶
Convert using a set of values sampled from the PDF
- Parameters:
in_dist (qp.Ensemble) – Input distributions
bins (np.array) – Histogram bin edges
- Returns:
data – The extracted data
- Return type:
dict
- qp.conversion_funcs.extract_hist_samples(in_dist, **kwargs)[source]¶
Convert using a set of values samples that are then histogramed
- Parameters:
in_dist (qp.Ensemble) – Input distributions
bins (np.array) – Histogram bin edges
size (int) – Number of samples to generate
- Returns:
data – The extracted data
- Return type:
dict
- qp.conversion_funcs.extract_quantiles(in_dist, **kwargs)[source]¶
Convert using a set of quantiles and the locations at which they are reached
- Parameters:
in_dist (qp.Ensemble) – Input distributions
quantiles (np.array) – Quantile values to use
- Returns:
data – The extracted data
- Return type:
dict
- qp.conversion_funcs.extract_fit(in_dist, **kwargs)[source]¶
Convert to a functional distribution by fitting it to a set of x and y values
- Parameters:
in_dist (qp.Ensemble) – Input distributions
xvals (np.array) – Locations at which the pdf is evaluated
- Returns:
data – The extracted data
- Return type:
dict
- qp.conversion_funcs.extract_mixmod_fit_samples(in_dist, **kwargs)[source]¶
Convert to a mixture model using a set of values sample from the pdf
- Parameters:
in_dist (qp.Ensemble) – Input distributions
ncomps (int) – Number of components in mixture model to use
nsamples (int) – Number of samples to generate
random_state (int) – Used to reproducibly generate random variate from in_dist
- Returns:
data – The extracted data
- Return type:
dict
- qp.conversion_funcs.extract_voigt_mixmod(in_dist, **kwargs)[source]¶
Convert to a voigt mixture model starting with a gaussian mixture model, trivially by setting gammas to 0
- Parameters:
in_dist (qp.Ensemble) – Input distributions
- Returns:
data – The extracted data
- Return type:
dict
- qp.conversion_funcs.extract_voigt_xy(in_dist, **kwargs)[source]¶
Build a voigt function basis and run a match-pursuit algorithm to fit gridded data
- Parameters:
in_dist (qp.Ensemble) – Input distributions
- Returns:
data – The extracted data as sparse indices, basis, and metadata to rebuild the basis
- Return type:
dict
- qp.conversion_funcs.extract_voigt_xy_sparse(in_dist, **kwargs)[source]¶
Build a voigt function basis and run a match-pursuit algorithm to fit gridded data
- Parameters:
in_dist (qp.Ensemble) – Input distributions
- Returns:
data – The extracted data as shaped parameters means, stds, weights, gammas
- Return type:
dict
- qp.conversion_funcs.extract_sparse_from_xy(in_dist, **kwargs)[source]¶
Extract sparse representation from an xy interpolated representation
- Parameters:
in_dist (qp.Ensemble) – Input distributions
xvals (array-like) – Used to override the y-values
xvals – Used to override the x-values
nvals (int) – Used to override the number of bins
- Returns:
metadata – Dictionary with data for sparse representation
- Return type:
dict
Notes
This function will rebin to a grid more suited to the in_dist support by removing x-values corrsponding to y=0
- qp.conversion_funcs.extract_xy_sparse(in_dist, **kwargs)[source]¶
Extract xy-interpolated representation from an sparese representation
- Parameters:
in_dist (qp.Ensemble) – Input distributions
xvals (array-like) – Used to override the y-values
xvals – Used to override the x-values
nvals (int) – Used to override the number of bins
- Returns:
metadata – Dictionary with data for interpolated representation
- Return type:
dict
Notes
This function will rebin to a grid more suited to the in_dist support by removing x-values corrsponding to y=0
qp.utils: PDF evaluation and construction utility functions¶
Utility functions for the qp package
- qp.utils.safelog(arr, threshold=2.220446049250313e-16)[source]¶
Takes the natural logarithm of an array of potentially non-positive numbers
- Parameters:
arr (numpy.ndarray, float) – values to be logged
threshold (float) – small, positive value to replace zeros and negative numbers
- Returns:
logged – logarithms, with approximation in place of zeros and negative numbers
- Return type:
numpy.ndarray
- qp.utils.get_bin_indices(bins, x)[source]¶
Return the bin indexes for a set of values
If the bins are equal width this will use arithmatic, If the bins are not equal width this will use a binary search
- qp.utils.normalize_interp1d(xvals, yvals)[source]¶
Normalize a set of 1D interpolators
- Parameters:
xvals (array-like) – X-values used for the interpolation
yvals (array-like) – Y-values used for the interpolation
- Returns:
ynorm – Normalized y-vals
- Return type:
array-like
- qp.utils.build_kdes(samples, **kwargs)[source]¶
Build a set of Gaussian Kernal Density Estimates
- Parameters:
samples (array-like) – X-values used for the spline
Keywords –
-------- –
constructor (Passed to the scipy.stats.gaussian_kde) –
- Returns:
kdes
- Return type:
list of scipy.stats.gaussian_kde objects
- qp.utils.evaluate_kdes(xvals, kdes)[source]¶
Build a evaluate a set of kdes
- Parameters:
xvals (array_like) – X-values used for the spline
kdes (list of sps.gaussian_kde) – The kernel density estimates
- Returns:
yvals – The kdes evaluated at the xvamls
- Return type:
array_like
- qp.utils.get_eval_case(x, row)[source]¶
Figure out which of the various input formats scipy.stats has passed us
- Parameters:
x (array_like) – Pdf x-vals
row (array_like) – Pdf row indices
- Returns:
case (int) – The case code
xx (array_like) – The x-values properly shaped
rr (array_like) – The y-values, properly shaped
Notes
The cases are:
CASE_FLAT : x, row have shapes (n), (n) and do not factor CASE_FACTOR : x, row have shapes (n), (n) but can be factored to shapes (1, nx) and (npdf, 1)
(i.e., they were flattend by scipy)
CASE_PRODUCT : x, row have shapes (1, nx) and (npdf, 1) CASE_2D : x, row have shapes (npdf, nx) and (npdf, nx)
- qp.utils.evaluate_hist_x_multi_y_flat(x, row, bins, vals, derivs=None)[source]¶
Evaluate a set of values from histograms
- Parameters:
x (array_like (n)) – X values to interpolate at
row (array_like (n)) – Which rows to interpolate at
bins (array_like (N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents
- Returns:
out – The histogram values
- Return type:
array_like (n)
- qp.utils.evaluate_hist_x_multi_y_product(x, row, bins, vals, derivs=None)[source]¶
Evaluate a set of values from histograms
- Parameters:
x (array_like (npts)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
bins (array_like (N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents
- Returns:
out – The histogram values
- Return type:
array_like (npdf, npts)
- qp.utils.evaluate_hist_x_multi_y_2d(x, row, bins, vals, derivs=None)[source]¶
Evaluate a set of values from histograms
- Parameters:
x (array_like (npdf, npts)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
bins (array_like (N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents
- Returns:
out – The histogram values
- Return type:
array_like (npdf, npts)
- qp.utils.evaluate_hist_x_multi_y(x, row, bins, vals, derivs=None)[source]¶
Evaluate a set of values from histograms
- Parameters:
x (array_like) – X values to interpolate at
row (array_like) – Which rows to interpolate at
bins (array_like (N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents
- Returns:
out – The histogram values
- Return type:
array_like
Notes
Depending on the shape of ‘x’ and ‘row’ this will use one of the three specific implementations.
- qp.utils.evaluate_hist_multi_x_multi_y_flat(x, row, bins, vals, derivs=None)[source]¶
Evaluate a set of values from histograms
- Parameters:
x (array_like (n)) – X values to interpolate at
row (array_like (n)) – Which rows to interpolate at
bins (array_like (npdf, N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents
- Returns:
out – The histogram values
- Return type:
array_like (n)
- qp.utils.evaluate_hist_multi_x_multi_y_product(x, row, bins, vals, derivs=None)[source]¶
Evaluate a set of values from histograms
- Parameters:
x (array_like (npts)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
bins (array_like (npdf, N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents
- Returns:
out – The histogram values
- Return type:
array_like (npdf, npts)
- qp.utils.evaluate_hist_multi_x_multi_y_2d(x, row, bins, vals, derivs=None)[source]¶
Evaluate a set of values from histograms
- Parameters:
x (array_like (npdf, npts)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
bins (array_like (npdf, N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents
- Returns:
out – The histogram values
- Return type:
array_like (npdf, npts)
- qp.utils.evaluate_hist_multi_x_multi_y(x, row, bins, vals, derivs=None)[source]¶
Evaluate a set of values from histograms
- Parameters:
x (array_like) – X values to interpolate at
row (array_like) – Which rows to interpolate at
bins (array_like (npdf, N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents
- Returns:
out – The histogram values
- Return type:
array_like
- qp.utils.interpolate_x_multi_y_flat(x, row, xvals, yvals, **kwargs)[source]¶
Interpolate a set of values
- Parameters:
x (array_like (n)) – X values to interpolate at
row (array_like (n)) – Which rows to interpolate at
xvals (array_like (npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation
- Returns:
vals – The interpoalted values
- Return type:
array_like (npdf, n)
- qp.utils.interpolate_x_multi_y_product(x, row, xvals, yvals, **kwargs)[source]¶
Interpolate a set of values
- Parameters:
x (array_like (n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation
- Returns:
vals – The interpoalted values
- Return type:
array_like (npdf, n)
- qp.utils.interpolate_x_multi_y_2d(x, row, xvals, yvals, **kwargs)[source]¶
Interpolate a set of values
- Parameters:
x (array_like (npdf, n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation
- Returns:
vals – The interpoalted values
- Return type:
array_like (npdf, n)
- qp.utils.interpolate_x_multi_y(x, row, xvals, yvals, **kwargs)[source]¶
Interpolate a set of values
- Parameters:
x (array_like (npdf, n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation
- Returns:
vals – The interpoalted values
- Return type:
array_like
- qp.utils.interpolate_multi_x_multi_y_flat(x, row, xvals, yvals, **kwargs)[source]¶
Interpolate a set of values
- Parameters:
x (array_like (n)) – X values to interpolate at
row (array_like (n)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation
- Returns:
vals – The interpoalted values
- Return type:
array_like (npdf, n)
- qp.utils.interpolate_multi_x_multi_y_product(x, row, xvals, yvals, **kwargs)[source]¶
Interpolate a set of values
- Parameters:
x (array_like (n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation
- Returns:
vals – The interpoalted values
- Return type:
array_like (npdf, n)
- qp.utils.interpolate_multi_x_multi_y_2d(x, row, xvals, yvals, **kwargs)[source]¶
Interpolate a set of values
- Parameters:
x (array_like (npdf, n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation
- Returns:
vals – The interpoalted values
- Return type:
array_like (npdf, n)
- qp.utils.interpolate_multi_x_multi_y(x, row, xvals, yvals, **kwargs)[source]¶
Interpolate a set of values
- Parameters:
x (array_like (npdf, n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation
- Returns:
vals – The interpoalted values
- Return type:
array_like
- qp.utils.interpolate_multi_x_y_flat(x, row, xvals, yvals, **kwargs)[source]¶
Interpolate a set of values
- Parameters:
x (array_like (n)) – X values to interpolate at
row (array_like (n)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf)) – Y-avlues used for the inteolation
- Returns:
vals – The interpoalted values
- Return type:
array_like (npdf, n)
- qp.utils.interpolate_multi_x_y_product(x, row, xvals, yvals, **kwargs)[source]¶
Interpolate a set of values
- Parameters:
x (array_like (n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf)) – Y-avlues used for the inteolation
- Returns:
vals – The interpoalted values
- Return type:
array_like (npdf, n)
- qp.utils.interpolate_multi_x_y_2d(x, row, xvals, yvals, **kwargs)[source]¶
Interpolate a set of values
- Parameters:
x (array_like (npdf, n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf)) – Y-avlues used for the inteolation
- Returns:
vals – The interpoalted values
- Return type:
array_like (npdf, n)
- qp.utils.interpolate_multi_x_y(x, row, xvals, yvals, **kwargs)[source]¶
Interpolate a set of values
- Parameters:
x (array_like (npdf, n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf)) – Y-avlues used for the inteolation
- Returns:
vals – The interpoalted values
- Return type:
array_like
- qp.utils.profile(x_data, y_data, x_bins, std=True)[source]¶
Make a ‘profile’ plot
- Parameters:
x_data (array_like (n)) – The x-values
y_data (array_like (n)) – The y-values
x_bins (array_like (nbins+1)) – The values of the bin edges
std (bool) – If true, return the standard deviations, if false return the errors on the means
- Returns:
vals (array_like (nbins)) – The means
errs (array_like (nbins)) – The standard deviations or errors on the means
- qp.utils.reshape_to_pdf_size(vals, split_dim)[source]¶
Reshape an array to match the number of PDFs in a distribution
- Parameters:
vals (array) – The input array
split_dim (int) – The dimension at which to split between pdf indices and per_pdf indices
- Returns:
out – The reshaped array
- Return type:
array
- qp.utils.reshape_to_pdf_shape(vals, pdf_shape, per_pdf)[source]¶
Reshape an array to match the shape of PDFs in a distribution
- Parameters:
vals (array) – The input array
pdf_shape (int) – The shape for the pdfs
per_pdf (int or array_like) – The shape per pdf
- Returns:
out – The reshaped array
- Return type:
array
Infrastructure and Core functionality¶
qp.pdf_gen: scipy.stats interface¶
This module implements continous distributions generators that inherit from the scipy.stats.rv_continuous class
If you would like to add a sub-class, please read the instructions on subclassing here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html
Open questions: 1) At this time the normalization is not enforced for many of the PDF types. It is assumed that the user values give correct normalization. We should think about this more.
2) At this time for most of the distributions, only the _pdf function is overridden. This is all that is required to inherit from scipy.stats.rv_continuous; however, providing implementations of some of _logpdf, _cdf, _logcdf, _ppf, _rvs, _isf, _sf, _logsf could speed the code up a lot in some cases.
- class qp.pdf_gen.Pdf_gen(*args, **kwargs)[source]¶
Interface class to extend scipy.stats.rv_continuous with information needed for qp
Notes
Metadata are elements that are the same for all the PDFs These include the name and version of the PDF generation class, and possible data such as the bin edges used for histogram representations
Object data are elements that differ for each PDFs
- property metadata¶
Return the metadata for this set of PDFs
- property objdata¶
Return the object data for this set of PDFs
- classmethod creation_method(method=None)[source]¶
Return the method used to create a PDF of this type
- classmethod extraction_method(method=None)[source]¶
Return the method used to extract data to create a PDF of this type
- classmethod reader_method(version=None)[source]¶
Return the method used to convert data read from a file PDF of this type
- classmethod print_method_maps(stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Print the maps showing the methods
- classmethod create_gen(**kwds)[source]¶
Create and return a scipy.stats.rv_continuous object using the keyword arguemntets provided
- classmethod create(**kwds)[source]¶
Create and return a scipy.stats.rv_frozen object using the keyword arguemntets provided
- class qp.pdf_gen.rv_frozen_func(dist, *args, **kwds)[source]¶
Trivial extention of scipy.stats.rv_frozen that includes the number of PDFs it represents
- property ndim¶
Return the number of dimensions of PDFs in this ensemble
- property shape¶
Return the shape of the set of PDFs this object represents
- property npdf¶
Return the number of PDFs this object represents
- histogramize(bins)[source]¶
Computes integrated histogram bin values for all PDFs
- Parameters:
bins (ndarray, float, optional) – Array of N+1 endpoints of N bins
- Returns:
self.histogram – Array of pairs of arrays of lengths (N+1, N) containing endpoints of bins and values in bins
- Return type:
ndarray, tuple, ndarray, floats
- class qp.pdf_gen.rv_frozen_rows(dist, shape, *args, **kwds)[source]¶
Trivial extention of scipy.stats.rv_frozen that to use when we want to have a collection of distribution of objects such as histograms or splines, where each object represents a single distribtuion
- property ndim¶
Return the number of dimensions of PDFs in this ensemble
- property shape¶
Return the shape of the set of PDFs this object represents
- property npdf¶
Return the number of PDFs this object represents
- histogramize(bins)[source]¶
Computes integrated histogram bin values for all PDFs
- Parameters:
bins (ndarray, float, optional) – Array of N+1 endpoints of N bins
- Returns:
self.histogram – Array of pairs of arrays of lengths (N+1, N) containing endpoints of bins and values in bins
- Return type:
ndarray, tuple, ndarray, floats
- class qp.pdf_gen.Pdf_rows_gen(*args, **kwargs)[source]¶
Class extend scipy.stats.rv_continuous with information needed for qp when we want to have a collection of distribution of objects such as histograms or splines, where each object represents a single distribtuion
- property shape¶
Return the shape of the set of PDFs this object represents
- property npdf¶
Return the number of PDFs this object represents
- freeze(*args, **kwds)[source]¶
Freeze the distribution for the given arguments.
- Parameters:
arg1 (array_like) – The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include
loc
andscale
.arg2 (array_like) – The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include
loc
andscale
.arg3 (array_like) – The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include
loc
andscale
.... (array_like) – The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include
loc
andscale
.
- Returns:
rv_frozen – The frozen distribution.
- Return type:
rv_frozen instance
- classmethod create_gen(**kwds)[source]¶
Create and return a scipy.stats.rv_continuous object using the keyword arguemntets provided
- moment(n, *args, **kwds)[source]¶
Returns the moments request moments for all the PDFs.
This used to call a hacked version Pdf_gen._moment_fix which can handle cases of multiple PDFs. Now it prints a deprication warning for scipy < 1.8
- Parameters:
n (int) – Order of the moment
- Returns:
moments – The requested moments
- Return type:
array_like
- class qp.pdf_gen.Pdf_gen_wrap(*args, **kwargs)[source]¶
Mixin class to extend scipy.stats.rv_continuous with information needed for qp for analytic distributions.
qp.dict_utils tools for multi-level dictionary manipulation¶
This module implements tools to convert between distributions
- qp.dict_utils.get_val_or_default(in_dict, key)[source]¶
Helper functions to return either an item in a dictionary or the default value of the dictionary
- Parameters:
in_dict (dict) – input dictionary
key (str) – key to search for
- Returns:
out – The requested item
- Return type:
dict or function
Notes
- This will first try to return:
in_dict[key] : i.e., the requested item.
- If that fails it will try
in_dict[None] : i.e., the default for that dictionary.
- If that fails it will return
None
- qp.dict_utils.set_val_or_default(in_dict, key, val)[source]¶
Helper functions to either get and item from or add an item to a dictionary and return that item
- Parameters:
in_dict (dict) – input dictionary
key (str) – key to search for
val (dict or function) – item to add to the dictionary
- Returns:
out – The requested item
- Return type:
dict or function
Notes
- This will first try to return:
in_dict[key] : i.e., the requested item.
- If that fails it will try
in_dict[None] : i.e., the default for that dictionary.
- If that fails it will return
None
- qp.dict_utils.pretty_print(in_dict, prefixes, idx=0, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Print a level of the converstion dictionary in a human-readable format
- Parameters:
in_dict (dict) – input dictionary
prefixs (list) – The prefixs to use at each level of the printing
idx (int) – The level of the input dictionary we are currently printing
stream (stream) – The stream to print to
- qp.dict_utils.print_dict_shape(in_dict)[source]¶
Print the shape of arrays in a dictionary. This is useful for debugging table creation.
- Parameters:
in_dict (dict) – The dictionary to print
- qp.dict_utils.slice_dict(in_dict, subslice)[source]¶
Create a new dict by taking a slice of of every array in a dict
- Parameters:
in_dict (dict) – The dictionary to conver
subslice (int or slice) – Used to slice the arrays
- Returns:
out_dict – The converted dicionary
- Return type:
dict
- qp.dict_utils.check_keys(in_dicts)[source]¶
Check that the keys in all the in_dicts match
Raises KeyError if one does not match.
- qp.dict_utils.concatenate_dicts(in_dicts)[source]¶
Create a new dict by concatenate each array in in_dicts
- Parameters:
in_dicts (list) – The dictionaries to stack
- Returns:
out_dict – The stacked dicionary
- Return type:
dict
- qp.dict_utils.check_array_shapes(in_dict, npdf)[source]¶
Check that all the arrays in in_dict match the number of pdfs
Raises ValueError if one does not match.
qp.plotting: Tools for PDF plotting¶
Functions to plot PDFs
- qp.plotting.make_figure_axes(xlim, **kwargs)[source]¶
Build a figure and a set of figure axes to plot data on
- Parameters:
xlim ((float, float)) – The x-axis limits of the plot
**kwargs – passed directly to the matplotlib plot function
- Returns:
fig, axes
- Return type:
The figure and axes
- qp.plotting.plot_pdf_on_axes(axes, pdf, xvals, **kwargs)[source]¶
Plot a PDF on a set of axes, by evaluating it a set of points
- Parameters:
axes (matplotlib.axes or None) – The axes we want to plot the data on
pdf (scipy.stats.rv_frozen) – The distribution we want to plot
xvals (np.array) – The locations we evaluate the PDF at for plotting
**kwargs – Keywords are passed to matplotlib
- Returns:
axes
- Return type:
The axes the data are plotted on
- qp.plotting.plot_dist_pdf(pdf, **kwargs)[source]¶
Plot a PDF on a set of axes, using the axes limits
- Parameters:
pdf (scipy.stats.rv_frozen) – The distribution we want to plot
axes (matplotlib.axes) – The axes to plot on
xlim ((float, float)) – The x-axis limits
npts (int) – The number of x-axis points
kwargs (remaining) – passed directly to the plot_pdf_on_axes plot function
- Returns:
axes
- Return type:
The axes the data are plotted on
- qp.plotting.plot_pdf_quantiles_on_axes(axes, xvals, yvals, quantiles, **kwargs)[source]¶
Plot a PDF on a set of axes, by evaluating at the quantiles provided
- Parameters:
axes (The axes we want to plot the data on) –
xvals (array_like) – Pdf xvalues
yvals (array_like) – Pdf yvalues
quantiles ((np.array, np.array)) – The quantiles that define the distribution pdf
**kwargs – passed directly to the matplotlib plot function
npoints (int) – Number of points to use in the plotting. Evenly spaced along the axis provided.
- Returns:
axes
- Return type:
The axes the data are plotted on
- qp.plotting.plot_pdf_histogram_on_axes(axes, hist, **kwargs)[source]¶
Plot a PDF on a set of axes, by plotting the histogrammed data
- Parameters:
axes – The axes we want to plot the data on
**kwargs – passed directly to the matplotlib plot function
npoints (int) – Number of points to use in the plotting. Evenly spaced along the axis provided.
- Returns:
The axes the data are plotted on
- Return type:
axes
- qp.plotting.plot_pdf_samples_on_axes(axes, pdf, samples, **kwargs)[source]¶
Plot a PDF on a set of axes, by displaying a set of samples from the PDF
- Parameters:
axes (The axes we want to plot the data on) –
pdf (scipy.stats.rv_frozen) – The distribution we want to plot
samples (np.array) – Points sampled from the PDF
**kwargs – passed directly to the matplotlib plot function
- Returns:
axes
- Return type:
The axes the data are plotted on