API Documentation for qp¶

qp provides a PDF class object, that builds on the scipy.stats distributions to provide various approximate forms. The package also contains some utils and metrics for quantifying the quality of these approximations.

Ensemble and Factory¶

Implemenation of an ensemble of distributions

class qp.ensemble.Ensemble(gen_func, data, ancil=None)[source]¶

An object comprised of many qp.PDF objects to efficiently perform operations on all of them

property gen_func¶: Return the function used to create the distribution object for this ensemble

property gen_class¶: Return the class used to generate distributions for this ensemble

property dist¶: Return the scipy.stats.rv_continuous object that generates distributions for this ensemble

property kwds¶: Return the kwds associated to the frozen object

property gen_obj¶: Return the scipy.stats.rv_continuous object that generates distributions for this ensemble

property frozen¶: Return the scipy.stats.rv_frozen object that encapsultes the distributions for this ensemble

property ndim¶: Return the number of dimensions of PDFs in this ensemble

property shape¶: Return the number of PDFs in this ensemble

property npdf¶: Return the number of PDFs in this ensemble

property ancil¶: Return the ancillary data dictionary

convert_to(to_class, **kwargs)[source]¶

Convert a distribution or ensemble

Parameters:

to_class (class) – Class to convert to
**kwargs – keyword arguments are passed to the output class constructor
method (str) – Optional argument to specify a non-default conversion algorithm

Returns:

ens – Ensemble of pdfs yype class_to using the data from this object

Return type:

qp.Ensemble

update(data, ancil=None)[source]¶

Update the frozen object

Parameters:: data (dict) – Dictionary with data used to construct the ensemble

update_objdata(data, ancil=None)[source]¶

Update the object data in the distribution

Parameters:: data (dict) – Dictionary with data used to construct the ensemble

metadata()[source]¶

Return the metadata for this ensemble

Returns:: metadata – The metadata
Return type:: dict

Notes

Metadata are elements that are the same for all the PDFs in the ensemble These include the name and version of the PDF generation class

objdata()[source]¶

Return the object data for this ensemble

Returns:: objdata – The object data
Return type:: dict

Notes

Object data are elements that differ for each PDFs in the ensemble

set_ancil(ancil)[source]¶

Set the ancillary data dict

Parameters:: ancil (dict) – The ancillary data dictionary

Notes

Raises IndexError if the length of the arrays in ancil does not match the number of PDFs in the Ensemble

add_to_ancil(to_add)[source]¶

Add additionaly columns to the ancillary data dict

Parameters:: to_add (dict) – The columns to add to the ancillary data dict

Notes

Raises IndexError if the length of the arrays in to_add does not match the number of PDFs in the Ensemble

This calls dict.update() so it will overwrite existing columns

append(other_ens)[source]¶

Append another other_ens to this one

Parameters:: other_ens (qp.Ensemble) – The other Ensemble

build_tables()[source]¶

Return dicts of numpy arrays for the meta data and object data for this ensemble

Returns:

meta (dict) – Table with the meta data
data (dict) – Table with the object data

mode(grid)[source]¶

return the mode of each ensemble PDF, evaluated on grid

Parameters:: new_grid (array-like) – Grid on which to evaluate PDF
Returns:: mode – The modes of the PDFs evaluated on new_grid
Return type:: array-like

Notes

Adding expand_dims to return an (N, 1) array to be consistent with mean, median, and other point estimates

gridded(grid)[source]¶

Build, cache are return the PDF values at grid points

Parameters:: grid (array-like) – The grid points
Returns:: gridded
Return type:: (grid, pdf_values)

Notes

This first comparse grid to the cached value, if they match it returns the cached value

write_to(filename)[source]¶

Save this ensemble to a file

Parameters:: filename (str) –

Notes

This will actually write two files, one for the metadata and one for the object data

This uses tables_io to write the data, so any filesuffix that works for tables_io will work here.

pdf(x)[source]¶

Evaluates the probablity density function for the whole ensemble

Parameters:: x (float or ndarray, float) – location(s) at which to do the evaluations

logpdf(x)[source]¶

Evaluates the log of the probablity density function for the whole ensemble

Parameters:: x (float or ndarray, float) – location(s) at which to do the evaluations

cdf(x)[source]¶

Evaluates the cumalative distribution function for the whole ensemble

Parameters:: x (float or ndarray, float) – location(s) at which to do the evaluations

logcdf(x)[source]¶

Evaluates the log of the cumalative distribution function for the whole ensemble

Parameters:: x (float or ndarray, float) – location(s) at which to do the evaluations

ppf(q)[source]¶

Evaluates all the PPF of the distribution

Parameters:: q (float or ndarray, float) – location(s) at which to do the evaluations

sf(q)[source]¶

Evaluates the survival fraction of the distribution

Parameters:

x (float or ndarray, float) –

at which to evaluate the pdfs

logsf(q)[source]¶

Evaluates the log of the survival function of the distribution

Parameters:: q (float or ndarray, float) – location(s) at which to evaluate the pdfs
Returns:: Log of the survival function
Return type:: float or ndarray

isf(q)[source]¶

Evaluates the inverse of the survival fraction of the distribution

Parameters:

x (float or ndarray, float) –

at which to evaluate the pdfs

rvs(size=None, random_state=None)[source]¶

Generate samples from this ensmeble

Parameters:: size (int) – number of samples to return

stats(moments='mv')[source]¶

Retrun the stats for this ensemble

Parameters:: moments (str) – Which moments to include

median()[source]¶: Return the medians for this ensemble

mean()[source]¶: Return the means for this ensemble

var()[source]¶: Return the variences for this ensemble

std()[source]¶: Return the standard deviations for this ensemble

moment(n)[source]¶: Return the nth moments for this ensemble

entropy()[source]¶: Return the entropy for this ensemble

interval(alpha)[source]¶: Return the intervals corresponding to a confidnce level of alpha for this ensemble

histogramize(bins)[source]¶

Computes integrated histogram bin values for all PDFs

Parameters:: bins (ndarray, float, optional) – Array of N+1 endpoints of N bins
Returns:: self.histogram – Array of pairs of arrays of lengths (N+1, N) containing endpoints of bins and values in bins
Return type:: ndarray, tuple, ndarray, floats

integrate(limits)[source]¶

Computes the integral under the ensemble of PDFs between the given limits.

Parameters:

limits (numpy.ndarray, tuple, float) – limits of integration, may be different for all PDFs in the ensemble
using (string) – parametrization over which to approximate the integral
dx (float, optional) – granularity of integral

Returns:

integral – value of the integral

Return type:

numpy.ndarray, float

mix_mod_fit(comps=5)[source]¶

Fits the parameters of a given functional form to an approximation

Parameters:

comps (int, optional) – number of components to consider
using (string, optional) – which existing approximation to use, defaults to first approximation
vb (boolean) – Report progress

Returns:

self.mix_mod – list of qp.Composite objects approximating the PDFs

Return type:

list, qp.Composite objects

Notes

Currently only supports mixture of Gaussians

moment_partial(n, limits, dx=0.01)[source]¶: Return the nth moments for this over a particular range

plot(key=0, **kwargs)[source]¶

Plot the pdf as a curve

Parameters:: key (int or slice) – Which PDF or PDFs from this ensemble to plot

plot_native(key=0, **kwargs)[source]¶

Plot the pdf as a curve

Parameters:: key (int or slice) – Which PDF or PDFs from this ensemble to plot

initializeHdf5Write(filename, npdf, comm=None)[source]¶

set up the output write for an ensemble, but set size to npdf rather than the size of the ensemble, as the “initial chunk” will not contain the full data

Parameters:

filename (str) – Name of the file to create
npdf (int) – Total number of pdfs that will contain the file, usually larger then the size of the current ensemble
comm (MPI communicator) – Optional MPI communicator to allow parallel writing

writeHdf5Chunk(fname, start, end)[source]¶

write ensemble data chunk to file

Parameters:

fname (h5py File object) – file or group
start (int) – starting index of h5py file
end (int) – ending index in h5py file

finalizeHdf5Write(filename)[source]¶

write ensemble metadata to the output file

Parameters:: filename (h5py File object) – file or group

This module implements a factory that manages different types of PDFs

class qp.factory.Factory[source]¶

Factory that creates and manages PDFs

add_class(the_class)[source]¶

Add a class to the factory

Parameters:: the_class (class) – The class we are adding, must inherit from Pdf_Gen

create(class_name, data, method=None)[source]¶

Make an ensemble of a particular type of distribution

Parameters:

class_name (str) – The name of the class to make
data (dict) – Values passed to class create function
method (str [None]) – Used to select which creation method to invoke

Returns:

ens – The newly created ensemble

Return type:

qp.Ensemble

from_tables(tables)[source]¶

Build this ensemble from a tables

Parameters:: tables (dict) –

Notes

This will use information in the meta data table to figure out how to construct the data need to build the ensemble.

read_metadata(filename)[source]¶

Read an ensemble’s metadata from a file, without loading the full data.

Parameters:: filename (str) –

is_qp_file(filename)[source]¶

Test if a file is a qp file

Parameters:: filename (str) – File to test
Returns:: value – True if the file is a qp file
Return type:: bool

read(filename)[source]¶

Read this ensemble from a file

Parameters:: filename (str) –

Notes

This will use information in the meta data to figure out how to construct the data need to build the ensemble.

data_length(filename)[source]¶

Get the size of data

Parameters:: filename (str) –
Returns:: nrows
Return type:: int

iterator(filename, chunk_size=100000, rank=0, parallel_size=1)[source]¶

Return an iterator for chunked read

Parameters:

filename (str) –
chunk_size (int) –

convert(in_dist, class_name, **kwds)[source]¶

Read an ensemble to a different repersenation

Parameters:

in_dist (qp.Ensemble) – Input distributions
class_name (str) – Representation to convert to

Returns:

ens – The ensemble we converted to

Return type:

qp.Ensemble

pretty_print(stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

Print a level of the converstion dictionary in a human-readable format

Parameters:: stream (stream) – The stream to print to

static concatenate(ensembles)[source]¶

Concatanate a list of ensembles

Parameters:: ensembles (list) – The ensembles we are concatanating
Returns:: ens – The output
Return type:: qp.Ensemble

static write_dict(filename, ensemble_dict, **kwargs)[source]¶

static read_dict(filename)[source]¶: Assume that filename is an HDF5 file, containing multiple qp.Ensembles that have been stored at nparrays.

qp.factory.instance()[source]¶: Return the factory instance

qp.factory.add_class(the_class)¶

Add a class to the factory

Parameters:: the_class (class) – The class we are adding, must inherit from Pdf_Gen

qp.factory.create(class_name, data, method=None)¶

Make an ensemble of a particular type of distribution

Parameters:

class_name (str) – The name of the class to make
data (dict) – Values passed to class create function
method (str [None]) – Used to select which creation method to invoke

Returns:

ens – The newly created ensemble

Return type:

qp.Ensemble

qp.factory.read(filename)¶

Read this ensemble from a file

Parameters:: filename (str) –

Notes

This will use information in the meta data to figure out how to construct the data need to build the ensemble.

qp.factory.read_metadata(filename)¶

Read an ensemble’s metadata from a file, without loading the full data.

Parameters:: filename (str) –

qp.factory.iterator(filename, chunk_size=100000, rank=0, parallel_size=1)¶

Return an iterator for chunked read

Parameters:

filename (str) –
chunk_size (int) –

qp.factory.convert(in_dist, class_name, **kwds)¶

Read an ensemble to a different repersenation

Parameters:

in_dist (qp.Ensemble) – Input distributions
class_name (str) – Representation to convert to

Returns:

ens – The ensemble we converted to

Return type:

qp.Ensemble

qp.factory.concatenate(ensembles)¶

Concatanate a list of ensembles

Parameters:: ensembles (list) – The ensembles we are concatanating
Returns:: ens – The output
Return type:: qp.Ensemble

qp.factory.data_length(filename)¶

Get the size of data

Parameters:: filename (str) –
Returns:: nrows
Return type:: int

qp.factory.from_tables(tables)¶

Build this ensemble from a tables

Parameters:: tables (dict) –

Notes

This will use information in the meta data table to figure out how to construct the data need to build the ensemble.

qp.factory.is_qp_file(filename)¶

Test if a file is a qp file

Parameters:: filename (str) – File to test
Returns:: value – True if the file is a qp file
Return type:: bool

qp.factory.write_dict(filename, ensemble_dict, **kwargs)¶

qp.factory.read_dict(filename)¶: Assume that filename is an HDF5 file, containing multiple qp.Ensembles that have been stored at nparrays.

Distribution types¶

Histogram based¶

class qp.hist_gen(bins, pdfs, *args, **kwargs)[source]¶

Bases: Pdf_rows_gen

Histogram based distribution

Notes

This implements a PDF using a set of histogramed values.

The relevant data members are:

bins: n+1 bin edges (shared for all PDFs)

pdfs: (npdf, n) bin values

Inside a given bin the pdf() will return the pdf value. Outside the range bins[0], bins[-1] the pdf() will return 0.

Inside a given bin the cdf() will use a linear interpolation accross the bin Outside the range bins[0], bins[-1] the cdf() will return (0 or 1), respectively

The ppf() is computed by inverting the cdf(). ppf(0) will return bins[0] ppf(1) will return bins[-1]

name = 'hist'¶

version = 0¶

property bins¶: Return the histogram bin edges

property pdfs¶: Return the histogram bin values

custom_generic_moment(m)[source]¶: Compute the mth moment

classmethod get_allocation_kwds(npdf, **kwargs)[source]¶: Return kwds necessary to create ‘empty’ hdf5 file with npdf entries for iterative writeout

classmethod plot_native(pdf, **kwargs)[source]¶

Plot the PDF in a way that is particular to this type of distibution

For a histogram this shows the bin edges

classmethod add_mappings()[source]¶: Add this classes mappings to the conversion dictionary

classmethod make_test_data()[source]¶: Make data for unit tests

Interpolation of a fixed grid¶

class qp.interp_gen(xvals, yvals, *args, **kwargs)[source]¶

Bases: Pdf_rows_gen

Interpolator based distribution

Notes

This implements a PDF using a set of interpolated values.

This version use the same xvals for all the the PDFs, which allows for much faster evaluation, and reduces the memory usage by a factor of 2.

The relevant data members are:

xvals: (n) x values

yvals: (npdf, n) y values

Inside the range xvals[0], xvals[-1] tt simply takes a set of x and y values and uses scipy.interpolate.interp1d to build the PDF. Outside the range xvals[0], xvals[-1] the pdf() will return 0.

The cdf() is constructed by integrating analytically computing the cumulative sum at the xvals grid points and interpolating between them. This will give a slight discrepency with the true integral of the pdf(), bit is much, much faster to evaluate. Outside the range xvals[0], xvals[-1] the cdf() will return (0 or 1), respectively

The ppf() is computed by inverting the cdf(). ppf(0) will return xvals[0] ppf(1) will return xvals[-1]

name = 'interp'¶

version = 0¶

property xvals¶: Return the x-values used to do the interpolation

property yvals¶: Return the y-valus used to do the interpolation

custom_generic_moment(m)[source]¶: Compute the mth moment

classmethod get_allocation_kwds(npdf, **kwargs)[source]¶

Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.

Parameters:

npdf (int) – number of total PDFs that will be written out
kwargs (dict) – dictionary of kwargs needed to create the ensemble

classmethod plot_native(pdf, **kwargs)[source]¶

Plot the PDF in a way that is particular to this type of distibution

For a interpolated PDF this uses the interpolation points

classmethod add_mappings()[source]¶: Add this classes mappings to the conversion dictionary

classmethod make_test_data()[source]¶: Make data for unit tests

Interpolation of a non-fixed grid¶

class qp.interp_irregular_gen(xvals, yvals, *args, **kwargs)[source]¶

Bases: Pdf_rows_gen

Interpolator based distribution

Notes

This implements a PDF using a set of interpolated values.

This version use the different xvals for each the the PDFs, which allows for more precision.

The relevant data members are:

xvals: (npdf, n) x values

yvals: (npdf, n) y values

Inside the range xvals[:,0], xvals[:,-1] tt simply takes a set of x and y values and uses scipy.interpolate.interp1d to build the PDF. Outside the range xvals[:,0], xvals[:,-1] the pdf() will return 0.

The cdf() is constructed by integrating analytically computing the cumulative sum at the xvals grid points and interpolating between them. This will give a slight discrepency with the true integral of the pdf(), bit is much, much faster to evaluate. Outside the range xvals[:,0], xvals[:,-1] the cdf() will return (0 or 1), respectively

The ppf() is computed by inverting the cdf(). ppf(0) will return min(xvals) ppf(1) will return max(xvals)

name = 'interp_irregular'¶

version = 0¶

property xvals¶: Return the x-values used to do the interpolation

property yvals¶: Return the y-valus used to do the interpolation

classmethod get_allocation_kwds(npdf, **kwargs)[source]¶

Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.

Parameters:

npdf (int) – number of total PDFs that will be written out
kwargs (dict) – dictionary of kwargs needed to create the ensemble

classmethod plot_native(pdf, **kwargs)[source]¶

Plot the PDF in a way that is particular to this type of distibution

For a interpolated PDF this uses the interpolation points

classmethod add_mappings()[source]¶: Add this classes mappings to the conversion dictionary

classmethod make_test_data()[source]¶: Make data for unit tests

Spline based¶

class qp.spline_gen(*args, **kwargs)[source]¶

Bases: Pdf_rows_gen

Spline based distribution

Notes

This implements PDFs using a set of splines

The relevant data members are:

splx: (npdf, n) spline-knot x-values

sply: (npdf, n) spline-knot y-values

spln: (npdf) spline-knot order paramters

The pdf() for the ith pdf will return the result of scipy.interpolate.splev(x, splx[i], sply[i], spln[i))

The cdf() for the ith pdf will return the result of scipy.interpolate.splint(x, splx[i], sply[i], spln[i))

The ppf() will use the default scipy implementation, which inverts the cdf() as evaluated on an adaptive grid.

name = 'spline'¶

version = 0¶

static build_normed_splines(xvals, yvals, **kwargs)[source]¶

Build a set of normalized splines using the x and y values

Parameters:

xvals (array_like) – The x-values used to do the interpolation
yvals (array_like) – The y-values used to do the interpolation

Returns:

splx (array_like) – The x-values of the spline knots
sply (array_like) – The y-values of the spline knots
spln (array_like) – The order of the spline knots

classmethod create_from_xy_vals(xvals, yvals, **kwargs)[source]¶

Create a new distribution using the given x and y values

Parameters:

xvals (array_like) – The x-values used to do the interpolation
yvals (array_like) – The y-values used to do the interpolation

Returns:

pdf_obj – The requested PDF

Return type:

spline_gen

classmethod create_from_samples(xvals, samples, **kwargs)[source]¶

Create a new distribution using the given x and y values

Parameters:

xvals (array_like) – The x-values used to do the interpolation
samples (array_like) – The sample values used to build the KDE

Returns:

pdf_obj – The requested PDF

Return type:

spline_gen

property splx¶: Return x-values of the spline knots

property sply¶: Return y-values of the spline knots

property spln¶: Return order of the spline knots

classmethod get_allocation_kwds(npdf, **kwargs)[source]¶

Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.

Parameters:

npdf (int) – number of total PDFs that will be written out
kwargs (dict) – dictionary of kwargs needed to create the ensemble

classmethod plot_native(pdf, **kwargs)[source]¶

Plot the PDF in a way that is particular to this type of distibution

For a spline this shows the spline knots

classmethod add_mappings()[source]¶: Add this classes mappings to the conversion dictionary

classmethod make_test_data()[source]¶: Make data for unit tests

Quantile based¶

class qp.quant_gen(quants, locs, *args, **kwargs)[source]¶

Bases: Pdf_rows_gen

Quantile based distribution, where the PDF is defined piecewise from the quantiles

Notes

This implements a CDF by interpolating a set of quantile values

It simply takes a set of x and y values and uses scipy.interpolate.interp1d to build the CDF

name = 'quant'¶

version = 0¶

property quants¶: Return quantiles used to build the CDF

property locs¶: Return the locations at which those quantiles are reached

property pdf_constructor_name¶: Returns the name of the current pdf constructor. Matches a key in the PDF_CONSTRUCTORS dictionary.

property pdf_constructor: AbstractQuantilePdfConstructor¶

Returns the current PDF constructor, and allows the user to interact with its methods.

Returns:: Abstract base class of the active concrete PDF constructor.
Return type:: AbstractQuantilePdfConstructor

classmethod get_allocation_kwds(npdf, **kwargs)[source]¶: Return kwds necessary to create ‘empty’ hdf5 file with npdf entries for iterative writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.

classmethod plot_native(pdf, **kwargs)[source]¶

Plot the PDF in a way that is particular to this type of distibution

For a quantile this shows the quantiles points

classmethod add_mappings()[source]¶: Add this classes mappings to the conversion dictionary

Gaussian mixture model based¶

class qp.mixmod_gen(means, stds, weights, *args, **kwargs)[source]¶

Bases: Pdf_rows_gen

Mixture model based distribution

Notes

This implements a PDF using a Gaussian Mixture model

The relevant data members are:

means: (npdf, ncomp) means of the Gaussians stds: (npdf, ncomp) standard deviations of the Gaussians weights: (npdf, ncomp) weights for the Gaussians

The pdf() and cdf() are exact, and are computed as a weighted sum of the pdf() and cdf() of the component Gaussians.

The ppf() is computed by computing the cdf() values on a fixed grid and interpolating the inverse function.

name = 'mixmod'¶

version = 0¶

property weights¶: Return weights to attach to the Gaussians

property means¶: Return means of the Gaussians

property stds¶: Return standard deviations of the Gaussians

classmethod get_allocation_kwds(npdf, **kwargs)[source]¶

Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.

Parameters:

npdf (int) – number of total PDFs that will be written out
kwargs (dict) – dictionary of kwargs needed to create the ensemble

classmethod add_mappings()[source]¶: Add this classes mappings to the conversion dictionary

classmethod make_test_data()[source]¶: Make data for unit tests

scipy distributions¶

Module to define qp distributions that inherit from scipy distributions

Notes

In the qp distribtuions the last axis in the input array shapes is reserved for pdf parameters.

This is because qp deals with numerical representations of distributions, where some of the input parameters consist of arrays of values for each pdf.

scipy.stats assumes that all input parameters scalars for each pdf.

To ensure that scipy.stats based distributions behave the same as qp distributions we are going to insure that the all input variables have shape either (npdf, 1) or (1)

Quantification Metrics¶

This module implements some performance metrics for distribution parameterization

class qp.metrics.metrics.Grid(grid_values, cardinality, resolution, hist_bin_edges, limits)¶

cardinality¶: Alias for field number 1

grid_values¶: Alias for field number 0

hist_bin_edges¶: Alias for field number 3

limits¶: Alias for field number 4

resolution¶: Alias for field number 2

qp.metrics.metrics.calculate_moment(p, N, limits, dx=0.01)[source]¶

Calculates a moment of a qp.Ensemble object

Parameters:

p (qp.Ensemble object) – the collection of PDFs whose moment will be calculated
N (int) – order of the moment to be calculated
limits (tuple of floats) – endpoints of integration interval over which to calculate moments
dx (float) – resolution of integration grid

Returns:

M – value of the moment

Return type:

float

qp.metrics.metrics.calculate_kld(p, q, limits, dx=0.01)[source]¶

Calculates the Kullback-Leibler Divergence between two qp.Ensemble objects.

Parameters:

p (Ensemble object) – probability distribution closer to the truth
q (Ensemble object) – probability distribution that approximates p
limits (tuple of floats) – endpoints of integration interval in which to calculate KLD
dx (float) – resolution of integration grid

Returns:

Dpq – the value of the Kullback-Leibler Divergence from q to p

Return type:

float

Notes

TO DO: have this take number of points not dx!

qp.metrics.metrics.calculate_rmse(p, q, limits, dx=0.01)[source]¶

Calculates the Root Mean Square Error between two qp.Ensemble objects.

Parameters:

p (qp.Ensemble object) – probability distribution function whose distance between its truth and the approximation of q will be calculated.
q (qp.Ensemble object) – probability distribution function whose distance between its approximation and the truth of p will be calculated.
limits (tuple of floats) – endpoints of integration interval in which to calculate RMS
dx (float) – resolution of integration grid

Returns:

rms – the value of the RMS error between q and p

Return type:

float

Notes

TO DO: change dx to N

qp.metrics.metrics.calculate_rbpe(p, limits=(inf, inf))[source]¶

Calculates the risk based point estimates of a qp.Ensemble object. Algorithm as defined in 4.2 of ‘Photometric redshifts for Hyper Suprime-Cam Subaru Strategic Program Data Release 1’ (Tanaka et al. 2018).

Parameters:

p (qp.Ensemble object) – Ensemble of PDFs to be evalutated
limits (tuple) – The limits at which to evaluate possible z_best estimates. If custom limits are not provided then all potential z value will be considered using the scipy.optimize.minimize_scalar function.

Returns:

rbpes – The risk based point estimates of the provided ensemble.

Return type:

array of floats

qp.metrics.metrics.calculate_brier(p, truth, limits, dx=0.01)[source]¶

This function will do the following:

Generate a Mx1 sized grid based on limits and dx.
Produce an NxM array by evaluating the pdf for each of the N distribution objects in the Ensemble p on the grid.
Produce an NxM truth_array using the input truth and the generated grid. All values will be 0 or 1.
Create a Brier metric evaluation object
Return the result of the Brier metric calculation.

Parameters:

p (qp.Ensemble object) – of N distributions probability distribution functions that will be gridded and compared against truth.
truth (Nx1 sequence) – the list of true values, 1 per distribution in p.
limits (2-tuple of floats) – endpoints grid to evaluate the PDFs for the distributions in p
dx (float) – resolution of the grid Defaults to 0.01.

Returns:

Brier_metric

Return type:

float

qp.metrics.metrics.calculate_brier_for_accumulation(p, truth, limits, dx=0.01)[source]¶

qp.metrics.metrics.calculate_anderson_darling(p, scipy_distribution='norm', num_samples=100, _random_state=None)[source]¶

This function is deprecated and will be completely removed in a later version. Please use calculate_goodness_of_fit instead.

Return type:: logger.warning

qp.metrics.metrics.calculate_cramer_von_mises(p, q, num_samples=100, _random_state=None, **kwargs)[source]¶

This function is deprecated and will be completely removed in a later version. Please use calculate_goodness_of_fit instead.

Return type:: logger.warning

qp.metrics.metrics.calculate_kolmogorov_smirnov(p, q, num_samples=100, _random_state=None)[source]¶

This function is deprecated and will be completely removed in a later version. Please use calculate_goodness_of_fit instead.

Return type:: logger.warning

qp.metrics.metrics.calculate_outlier_rate(p, lower_limit=0.0001, upper_limit=0.9999)[source]¶

Fraction of outliers in each distribution

Parameters:

p (qp.Ensemble) – A collection of N distributions. This implementation expects that Ensembles are not nested.
lower_limit (float, optional) – Lower bound CDF for outliers, by default 0.0001
upper_limit (float, optional) – Upper bound CDF for outliers, by default 0.9999

Returns:

1xN array where each element is the percent of outliers for a distribution in the Ensemble.

Return type:

[float]

qp.metrics.metrics.calculate_goodness_of_fit(estimate, reference, fit_metric='ks', num_samples=100, _random_state=None)[source]¶

This method calculates goodness of fit between the distributions in the estimate and reference Ensembles using the specified fit_metric.

Parameters:

estimate (Ensemble containing N distributions) – Random variate samples will be drawn from this Ensemble
reference (Ensemble containing N or 1 distributions) – The CDF of the distributions in this Ensemble are used in the goodness of fit calculation.
fit_metric (string, optional) – The goodness of fit metric to use. One of [‘ad’, ‘cvm’, ‘ks’]. For clarity, ‘ad’ = Anderson-Darling, ‘cvm’ = Cramer-von Mises, and ‘ks’ = Kolmogorov-Smirnov, by default ‘ks’
num_samples (int, optional) – Number of random variates to draw from each distribution in estimate, by default 100
_random_state (_type_, optional) – Used for testing to create reproducible sets of random variates, by default None

Returns:

output – A array of floats where each element is the result of the statistic calculation.

Return type:

[float]

Raises:

KeyError – If the requested fit_metric is not contained in goodness_of_fit_metrics dictionary, raise a KeyError.

Notes

The calculation of the goodness of fit metrics is not symmetric. i.e. calculate_goodness_of_fit(p, q, …) != calculate_goodness_of_fit(q, p, …)

In the future, we should be able to do this directly from the PDFs without needing to take random variates from the estimate Ensemble.

The vectorized implementations of fit metrics are copied over (unmodified) from the developer branch of Scipy 1.10.0dev. When Scipy 1.10 is released, we can replace the copied implementation with the ones in Scipy.

This module implements metric calculations that are independent of qp.Ensembles

qp.metrics.array_metrics.quick_anderson_ksamp(p_random_variables, q_random_variables, **kwargs)[source]¶

Calculate the k-sample Anderson-Darling statistic using scipy.stats.anderson_ksamp for two CDFs. For more details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html

Parameters:

p_random_variables (np.array) – An array of random variables from the given distribution
q_random_variables (np.array) – An array of random variables from the given distribution

Returns:

A array of objects with attributes statistic, critical_values, and significance_level.

Return type:

[Result objects]

qp.metrics.array_metrics.quick_kld(p_eval, q_eval, dx=0.01)[source]¶

Calculates the Kullback-Leibler Divergence between two evaluations of PDFs.

Parameters:

p_eval (numpy.ndarray, float) – evaluations of probability distribution closer to the truth
q_eval (numpy.ndarray, float) – evaluations of probability distribution that approximates p
dx (float) – resolution of integration grid

Returns:

Dpq – the value of the Kullback-Leibler Divergence from q to p

Return type:

float

qp.metrics.array_metrics.quick_moment(p_eval, grid_to_N, dx)[source]¶

Calculates a moment of an evaluated PDF

Parameters:

p_eval (numpy.ndarray, float) – the values of a probability distribution
grid (numpy.ndarray, float) – the grid upon which p_eval was evaluated
dx (float) – the difference between regular grid points
N (int) – order of the moment to be calculated

Returns:

M – value of the moment

Return type:

float

qp.metrics.array_metrics.quick_rmse(p_eval, q_eval, N)[source]¶

Calculates the Root Mean Square Error between two evaluations of PDFs.

Parameters:

p_eval (numpy.ndarray, float) – evaluation of probability distribution function whose distance between its truth and the approximation of q will be calculated.
q_eval (numpy.ndarray, float) – evaluation of probability distribution function whose distance between its approximation and the truth of p will be calculated.
N (int) – number of points at which PDFs were evaluated

Returns:

rms – the value of the RMS error between q and p

Return type:

float

qp.metrics.array_metrics.quick_rbpe(pdf_function, integration_bounds, limits=(inf, inf))[source]¶

Calculates the risk based point estimate of a qp.Ensemble object with npdf == 1.

Parameters:

pdf_function – The function should calculate the value of a pdf at a given x value
function (python) – The function should calculate the value of a pdf at a given x value
integration_bounds – The integration bounds - typically (ppf(0.01), ppf(0.99)) for the given distribution
floats (tuple of) – The integration bounds - typically (ppf(0.01), ppf(0.99)) for the given distribution
limits – The limits at which to evaluate possible z_best estimates. If custom limits are not provided then all potential z value will be considered using the scipy.optimize.minimize_scalar function.
floats – The limits at which to evaluate possible z_best estimates. If custom limits are not provided then all potential z value will be considered using the scipy.optimize.minimize_scalar function.

Returns:

rbpe – The risk based point estimate of the provided ensemble.

Return type:

float

class qp.metrics.brier.Brier(prediction, truth)[source]¶

Brier score based on https://en.wikipedia.org/wiki/Brier_score#Original_definition_by_Brier

Parameters:

prediction (NxM array, float) – Predicted probability for N distributions to have a true value in one of M bins. The sum of values along each row N should be 1.
truth (NxM array, int) – True values for N distributions, where Mth bin for the true value will have value 1, all other bins will have a value of 0.

evaluate()[source]¶

Evaluate the Brier score.

Returns:: The result of calculating the Brier metric, a value in the interval [0,2]
Return type:: float

class qp.metrics.pit.PIT(qp_ens, true_vals, eval_grid=DEFAULT_QUANTS)[source]¶

Probability Integral Transform

Parameters:

qp_ens (Ensemble) – A collection of N distribution objects
true_vals ([float]) – An array-like sequence of N float values representing the known true value for each distribution
eval_grid ([float], optional) – A strictly increasing array-like sequence in the range [0,1], by default DEFAULT_QUANTS

Returns:

An object with an Ensemble containing the PIT distribution, and a full set of PIT samples.

Return type:

PIT object

property pit_samps¶

Returns the PIT samples. i.e. CDF(true_vals) for each distribution in the Ensemble used to initialize the PIT object.

Returns:: An array of floats
Return type:: np.array

property pit¶

Return the PIT Ensemble object

Returns:: An Ensemble containing 1 qp.quant distribution.
Return type:: qp.Ensemble

calculate_pit_meta_metrics()[source]¶

Convenience method that will calculate all of the PIT meta metrics and return them as a dictionary.

Returns:: The collection of PIT statistics
Return type:: dictionary

evaluate_PIT_anderson_ksamp(pit_min=0.0, pit_max=1.0)[source]¶

Use scipy.stats.anderson_ksamp to compute the Anderson-Darling statistic for the cdf(truth) values by comparing with a uniform distribution between 0 and 1. Up to the current version (1.9.3), scipy.stats.anderson does not support uniform distributions as reference for 1-sample test, therefore we create a uniform “distribution” and pass it as the second value in the list of parameters to the scipy implementation of k-sample Anderson-Darling. For details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html

Parameters:

pit_min (float, optional) – Minimum PIT value to accept, by default 0.
pit_max (float, optional) – Maximum PIT value to accept, by default 1.

Returns:

A array of objects with attributes statistic, critical_values, and significance_level. For details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html

Return type:

array

evaluate_PIT_CvM()[source]¶

Calculate the Cramer von Mises statistic using scipy.stats.cramervonmises using self._pit_samps compared to a uniform distribution. For more details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.cramervonmises.html

Returns:: A array of objects with attributes statistic and pvalue For details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.cramervonmises.html
Return type:: array

evaluate_PIT_KS()[source]¶

Calculate the Kolmogorov-Smirnov statistic using scipy.stats.kstest. For more details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html

Returns:: A array of objects with attributes statistic and pvalue. For details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html
Return type:: array

evaluate_PIT_outlier_rate(pit_min=0.0001, pit_max=0.9999)[source]¶

Compute fraction of PIT outliers by evaluating the CDF of the distribution in the PIT Ensemble at pit_min and pit_max.

Parameters:

pit_min (float, optional) – Lower bound for outliers, by default 0.0001
pit_max (float, optional) – Upper bound for outliers, by default 0.9999

Returns:

The percentage of outliers in this distribution given the min and max bounds.

Return type:

float

Utility functions¶

qp.conversion_funcs¶

This module implements functions to convert distributions between various representations These functions should then be registered with the qp.ConversionDict using qp_add_mapping. That will allow the automated conversion mechanisms to work.

qp.conversion_funcs.extract_vals_at_x(in_dist, **kwargs)[source]¶

Convert using a set of x and y values.

Parameters:

in_dist (qp.Ensemble) – Input distributions
xvals (np.array) – Locations at which the pdf is evaluated

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_xy_vals(in_dist, **kwargs)[source]¶

Convert using a set of x and y values.

Parameters:

in_dist (qp.Ensemble) – Input distributions
xvals (np.array) – Locations at which the pdf is evaluated

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_samples(in_dist, **kwargs)[source]¶

Convert using a set of values sampled from the PDF

Parameters:

in_dist (qp.Ensemble) – Input distributions
size (int) – Number of samples to generate

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_hist_values(in_dist, **kwargs)[source]¶

Convert using a set of values sampled from the PDF

Parameters:

in_dist (qp.Ensemble) – Input distributions
bins (np.array) – Histogram bin edges

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_hist_samples(in_dist, **kwargs)[source]¶

Convert using a set of values samples that are then histogramed

Parameters:

in_dist (qp.Ensemble) – Input distributions
bins (np.array) – Histogram bin edges
size (int) – Number of samples to generate

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_quantiles(in_dist, **kwargs)[source]¶

Convert using a set of quantiles and the locations at which they are reached

Parameters:

in_dist (qp.Ensemble) – Input distributions
quantiles (np.array) – Quantile values to use

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_fit(in_dist, **kwargs)[source]¶

Convert to a functional distribution by fitting it to a set of x and y values

Parameters:

in_dist (qp.Ensemble) – Input distributions
xvals (np.array) – Locations at which the pdf is evaluated

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_mixmod_fit_samples(in_dist, **kwargs)[source]¶

Convert to a mixture model using a set of values sample from the pdf

Parameters:

in_dist (qp.Ensemble) – Input distributions
ncomps (int) – Number of components in mixture model to use
nsamples (int) – Number of samples to generate
random_state (int) – Used to reproducibly generate random variate from in_dist

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_voigt_mixmod(in_dist, **kwargs)[source]¶

Convert to a voigt mixture model starting with a gaussian mixture model, trivially by setting gammas to 0

Parameters:: in_dist (qp.Ensemble) – Input distributions
Returns:: data – The extracted data
Return type:: dict

qp.conversion_funcs.extract_voigt_xy(in_dist, **kwargs)[source]¶

Build a voigt function basis and run a match-pursuit algorithm to fit gridded data

Parameters:: in_dist (qp.Ensemble) – Input distributions
Returns:: data – The extracted data as sparse indices, basis, and metadata to rebuild the basis
Return type:: dict

qp.conversion_funcs.extract_voigt_xy_sparse(in_dist, **kwargs)[source]¶

Build a voigt function basis and run a match-pursuit algorithm to fit gridded data

Parameters:: in_dist (qp.Ensemble) – Input distributions
Returns:: data – The extracted data as shaped parameters means, stds, weights, gammas
Return type:: dict

qp.conversion_funcs.extract_sparse_from_xy(in_dist, **kwargs)[source]¶

Extract sparse representation from an xy interpolated representation

Parameters:

in_dist (qp.Ensemble) – Input distributions
xvals (array-like) – Used to override the y-values
xvals – Used to override the x-values
nvals (int) – Used to override the number of bins

Returns:

metadata – Dictionary with data for sparse representation

Return type:

dict

Notes

This function will rebin to a grid more suited to the in_dist support by removing x-values corrsponding to y=0

qp.conversion_funcs.extract_xy_sparse(in_dist, **kwargs)[source]¶

Extract xy-interpolated representation from an sparese representation

Parameters:

in_dist (qp.Ensemble) – Input distributions
xvals (array-like) – Used to override the y-values
xvals – Used to override the x-values
nvals (int) – Used to override the number of bins

Returns:

metadata – Dictionary with data for interpolated representation

Return type:

dict

Notes

This function will rebin to a grid more suited to the in_dist support by removing x-values corrsponding to y=0

qp.utils: PDF evaluation and construction utility functions¶

Utility functions for the qp package

qp.utils.safelog(arr, threshold=2.220446049250313e-16)[source]¶

Takes the natural logarithm of an array of potentially non-positive numbers

Parameters:

arr (numpy.ndarray, float) – values to be logged
threshold (float) – small, positive value to replace zeros and negative numbers

Returns:

logged – logarithms, with approximation in place of zeros and negative numbers

Return type:

numpy.ndarray

qp.utils.edge_to_center(edges)[source]¶: Return the centers of a set of bins given the edges

qp.utils.bin_widths(edges)[source]¶: Return the widths of a set of bins given the edges

qp.utils.get_bin_indices(bins, x)[source]¶

Return the bin indexes for a set of values

If the bins are equal width this will use arithmatic, If the bins are not equal width this will use a binary search

qp.utils.normalize_interp1d(xvals, yvals)[source]¶

Normalize a set of 1D interpolators

Parameters:

xvals (array-like) – X-values used for the interpolation
yvals (array-like) – Y-values used for the interpolation

Returns:

ynorm – Normalized y-vals

Return type:

array-like

qp.utils.build_kdes(samples, **kwargs)[source]¶

Build a set of Gaussian Kernal Density Estimates

Parameters:

samples (array-like) – X-values used for the spline
Keywords –
-------- –
constructor (Passed to the scipy.stats.gaussian_kde) –

Returns:

kdes

Return type:

list of scipy.stats.gaussian_kde objects

qp.utils.evaluate_kdes(xvals, kdes)[source]¶

Build a evaluate a set of kdes

Parameters:

xvals (array_like) – X-values used for the spline
kdes (list of sps.gaussian_kde) – The kernel density estimates

Returns:

yvals – The kdes evaluated at the xvamls

Return type:

array_like

qp.utils.get_eval_case(x, row)[source]¶

Figure out which of the various input formats scipy.stats has passed us

Parameters:

x (array_like) – Pdf x-vals
row (array_like) – Pdf row indices

Returns:

case (int) – The case code
xx (array_like) – The x-values properly shaped
rr (array_like) – The y-values, properly shaped

Notes

The cases are:

CASE_FLAT : x, row have shapes (n), (n) and do not factor CASE_FACTOR : x, row have shapes (n), (n) but can be factored to shapes (1, nx) and (npdf, 1)

(i.e., they were flattend by scipy)

CASE_PRODUCT : x, row have shapes (1, nx) and (npdf, 1) CASE_2D : x, row have shapes (npdf, nx) and (npdf, nx)

qp.utils.evaluate_hist_x_multi_y_flat(x, row, bins, vals, derivs=None)[source]¶

Evaluate a set of values from histograms

Parameters:

x (array_like (n)) – X values to interpolate at
row (array_like (n)) – Which rows to interpolate at
bins (array_like (N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like (n)

qp.utils.evaluate_hist_x_multi_y_product(x, row, bins, vals, derivs=None)[source]¶

Evaluate a set of values from histograms

Parameters:

x (array_like (npts)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
bins (array_like (N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like (npdf, npts)

qp.utils.evaluate_hist_x_multi_y_2d(x, row, bins, vals, derivs=None)[source]¶

Evaluate a set of values from histograms

Parameters:

x (array_like (npdf, npts)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
bins (array_like (N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like (npdf, npts)

qp.utils.evaluate_hist_x_multi_y(x, row, bins, vals, derivs=None)[source]¶

Evaluate a set of values from histograms

Parameters:

x (array_like) – X values to interpolate at
row (array_like) – Which rows to interpolate at
bins (array_like (N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like

Notes

Depending on the shape of ‘x’ and ‘row’ this will use one of the three specific implementations.

qp.utils.evaluate_hist_multi_x_multi_y_flat(x, row, bins, vals, derivs=None)[source]¶

Evaluate a set of values from histograms

Parameters:

x (array_like (n)) – X values to interpolate at
row (array_like (n)) – Which rows to interpolate at
bins (array_like (npdf, N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like (n)

qp.utils.evaluate_hist_multi_x_multi_y_product(x, row, bins, vals, derivs=None)[source]¶

Evaluate a set of values from histograms

Parameters:

x (array_like (npts)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
bins (array_like (npdf, N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like (npdf, npts)

qp.utils.evaluate_hist_multi_x_multi_y_2d(x, row, bins, vals, derivs=None)[source]¶

Evaluate a set of values from histograms

Parameters:

x (array_like (npdf, npts)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
bins (array_like (npdf, N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like (npdf, npts)

qp.utils.evaluate_hist_multi_x_multi_y(x, row, bins, vals, derivs=None)[source]¶

Evaluate a set of values from histograms

Parameters:

x (array_like) – X values to interpolate at
row (array_like) – Which rows to interpolate at
bins (array_like (npdf, N+1)) – ‘x’ bin edges
vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like

qp.utils.interpolate_x_multi_y_flat(x, row, xvals, yvals, **kwargs)[source]¶

Interpolate a set of values

Parameters:

x (array_like (n)) – X values to interpolate at
row (array_like (n)) – Which rows to interpolate at
xvals (array_like (npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_x_multi_y_product(x, row, xvals, yvals, **kwargs)[source]¶

Interpolate a set of values

Parameters:

x (array_like (n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_x_multi_y_2d(x, row, xvals, yvals, **kwargs)[source]¶

Interpolate a set of values

Parameters:

x (array_like (npdf, n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_x_multi_y(x, row, xvals, yvals, **kwargs)[source]¶

Interpolate a set of values

Parameters:

x (array_like (npdf, n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like

qp.utils.interpolate_multi_x_multi_y_flat(x, row, xvals, yvals, **kwargs)[source]¶

Interpolate a set of values

Parameters:

x (array_like (n)) – X values to interpolate at
row (array_like (n)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_multi_x_multi_y_product(x, row, xvals, yvals, **kwargs)[source]¶

Interpolate a set of values

Parameters:

x (array_like (n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_multi_x_multi_y_2d(x, row, xvals, yvals, **kwargs)[source]¶

Interpolate a set of values

Parameters:

x (array_like (npdf, n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_multi_x_multi_y(x, row, xvals, yvals, **kwargs)[source]¶

Interpolate a set of values

Parameters:

x (array_like (npdf, n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like

qp.utils.interpolate_multi_x_y_flat(x, row, xvals, yvals, **kwargs)[source]¶

Interpolate a set of values

Parameters:

x (array_like (n)) – X values to interpolate at
row (array_like (n)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_multi_x_y_product(x, row, xvals, yvals, **kwargs)[source]¶

Interpolate a set of values

Parameters:

x (array_like (n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_multi_x_y_2d(x, row, xvals, yvals, **kwargs)[source]¶

Interpolate a set of values

Parameters:

x (array_like (npdf, n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_multi_x_y(x, row, xvals, yvals, **kwargs)[source]¶

Interpolate a set of values

Parameters:

x (array_like (npdf, n)) – X values to interpolate at
row (array_like (npdf, 1)) – Which rows to interpolate at
xvals (array_like (npdf, npts)) – X-values used for the interpolation
yvals (array_like (npdf)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like

qp.utils.profile(x_data, y_data, x_bins, std=True)[source]¶

Make a ‘profile’ plot

Parameters:

x_data (array_like (n)) – The x-values
y_data (array_like (n)) – The y-values
x_bins (array_like (nbins+1)) – The values of the bin edges
std (bool) – If true, return the standard deviations, if false return the errors on the means

Returns:

vals (array_like (nbins)) – The means
errs (array_like (nbins)) – The standard deviations or errors on the means

qp.utils.reshape_to_pdf_size(vals, split_dim)[source]¶

Reshape an array to match the number of PDFs in a distribution

Parameters:

vals (array) – The input array
split_dim (int) – The dimension at which to split between pdf indices and per_pdf indices

Returns:

out – The reshaped array

Return type:

array

qp.utils.reshape_to_pdf_shape(vals, pdf_shape, per_pdf)[source]¶

Reshape an array to match the shape of PDFs in a distribution

Parameters:

vals (array) – The input array
pdf_shape (int) – The shape for the pdfs
per_pdf (int or array_like) – The shape per pdf

Returns:

out – The reshaped array

Return type:

array

Infrastructure and Core functionality¶

qp.pdf_gen: scipy.stats interface¶

This module implements continous distributions generators that inherit from the scipy.stats.rv_continuous class

If you would like to add a sub-class, please read the instructions on subclassing here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html

Open questions: 1) At this time the normalization is not enforced for many of the PDF types. It is assumed that the user values give correct normalization. We should think about this more.

2) At this time for most of the distributions, only the _pdf function is overridden. This is all that is required to inherit from scipy.stats.rv_continuous; however, providing implementations of some of _logpdf, _cdf, _logcdf, _ppf, _rvs, _isf, _sf, _logsf could speed the code up a lot in some cases.

class qp.pdf_gen.Pdf_gen(*args, **kwargs)[source]¶

Interface class to extend scipy.stats.rv_continuous with information needed for qp

Notes

Metadata are elements that are the same for all the PDFs These include the name and version of the PDF generation class, and possible data such as the bin edges used for histogram representations

Object data are elements that differ for each PDFs

property metadata¶: Return the metadata for this set of PDFs

property objdata¶: Return the object data for this set of PDFs

classmethod creation_method(method=None)[source]¶: Return the method used to create a PDF of this type

classmethod extraction_method(method=None)[source]¶: Return the method used to extract data to create a PDF of this type

classmethod reader_method(version=None)[source]¶: Return the method used to convert data read from a file PDF of this type

classmethod add_method_dicts()[source]¶: Add empty method dicts

classmethod print_method_maps(stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶: Print the maps showing the methods

classmethod create_gen(**kwds)[source]¶: Create and return a scipy.stats.rv_continuous object using the keyword arguemntets provided

classmethod create(**kwds)[source]¶: Create and return a scipy.stats.rv_frozen object using the keyword arguemntets provided

classmethod plot(pdf, **kwargs)[source]¶: Plot the pdf as a curve

classmethod plot_native(pdf, **kwargs)[source]¶

Plot the PDF in a way that is particular to this type of distibution

This defaults to plotting it as a curve, but this can be overwritten

classmethod get_allocation_kwds(npdf, **kwargs)[source]¶: Return kwds necessary to create ‘empty’ hdf5 file with npdf entries for iterative writeout

class qp.pdf_gen.rv_frozen_func(dist, *args, **kwds)[source]¶

Trivial extention of scipy.stats.rv_frozen that includes the number of PDFs it represents

property ndim¶: Return the number of dimensions of PDFs in this ensemble

property shape¶: Return the shape of the set of PDFs this object represents

property npdf¶: Return the number of PDFs this object represents

histogramize(bins)[source]¶

Computes integrated histogram bin values for all PDFs

Parameters:: bins (ndarray, float, optional) – Array of N+1 endpoints of N bins
Returns:: self.histogram – Array of pairs of arrays of lengths (N+1, N) containing endpoints of bins and values in bins
Return type:: ndarray, tuple, ndarray, floats

class qp.pdf_gen.rv_frozen_rows(dist, shape, *args, **kwds)[source]¶

Trivial extention of scipy.stats.rv_frozen that to use when we want to have a collection of distribution of objects such as histograms or splines, where each object represents a single distribtuion

property ndim¶: Return the number of dimensions of PDFs in this ensemble

property shape¶: Return the shape of the set of PDFs this object represents

property npdf¶: Return the number of PDFs this object represents

histogramize(bins)[source]¶

Computes integrated histogram bin values for all PDFs

Parameters:: bins (ndarray, float, optional) – Array of N+1 endpoints of N bins
Returns:: self.histogram – Array of pairs of arrays of lengths (N+1, N) containing endpoints of bins and values in bins
Return type:: ndarray, tuple, ndarray, floats

class qp.pdf_gen.Pdf_rows_gen(*args, **kwargs)[source]¶

Class extend scipy.stats.rv_continuous with information needed for qp when we want to have a collection of distribution of objects such as histograms or splines, where each object represents a single distribtuion

property shape¶: Return the shape of the set of PDFs this object represents

property npdf¶: Return the number of PDFs this object represents

freeze(*args, **kwds)[source]¶

Freeze the distribution for the given arguments.

Parameters:

arg1 (array_like) – The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include loc and scale.
arg2 (array_like) – The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include loc and scale.
arg3 (array_like) – The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include loc and scale.
... (array_like) – The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include loc and scale.

Returns:

rv_frozen – The frozen distribution.

Return type:

rv_frozen instance

classmethod create_gen(**kwds)[source]¶: Create and return a scipy.stats.rv_continuous object using the keyword arguemntets provided

moment(n, *args, **kwds)[source]¶

Returns the moments request moments for all the PDFs.

This used to call a hacked version Pdf_gen._moment_fix which can handle cases of multiple PDFs. Now it prints a deprication warning for scipy < 1.8

Parameters:: n (int) – Order of the moment
Returns:: moments – The requested moments
Return type:: array_like

class qp.pdf_gen.Pdf_gen_wrap(*args, **kwargs)[source]¶

Mixin class to extend scipy.stats.rv_continuous with information needed for qp for analytic distributions.

classmethod get_allocation_kwds(npdf, **kwargs)[source]¶: Return kwds necessary to create ‘empty’ hdf5 file with npdf entries for iterative writeout

classmethod add_mappings()[source]¶: Add this classes mappings to the conversion dictionary

qp.dict_utils tools for multi-level dictionary manipulation¶

This module implements tools to convert between distributions

qp.dict_utils.get_val_or_default(in_dict, key)[source]¶

Helper functions to return either an item in a dictionary or the default value of the dictionary

Parameters:

in_dict (dict) – input dictionary
key (str) – key to search for

Returns:

out – The requested item

Return type:

dict or function

Notes

This will first try to return:: in_dict[key] : i.e., the requested item.
If that fails it will try: in_dict[None] : i.e., the default for that dictionary.
If that fails it will return: None

qp.dict_utils.set_val_or_default(in_dict, key, val)[source]¶

Helper functions to either get and item from or add an item to a dictionary and return that item

Parameters:

in_dict (dict) – input dictionary
key (str) – key to search for
val (dict or function) – item to add to the dictionary

Returns:

out – The requested item

Return type:

dict or function

Notes

This will first try to return:: in_dict[key] : i.e., the requested item.
If that fails it will try: in_dict[None] : i.e., the default for that dictionary.
If that fails it will return: None

qp.dict_utils.pretty_print(in_dict, prefixes, idx=0, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

Print a level of the converstion dictionary in a human-readable format

Parameters:

in_dict (dict) – input dictionary
prefixs (list) – The prefixs to use at each level of the printing
idx (int) – The level of the input dictionary we are currently printing
stream (stream) – The stream to print to

qp.dict_utils.print_dict_shape(in_dict)[source]¶

Print the shape of arrays in a dictionary. This is useful for debugging table creation.

Parameters:: in_dict (dict) – The dictionary to print

qp.dict_utils.slice_dict(in_dict, subslice)[source]¶

Create a new dict by taking a slice of of every array in a dict

Parameters:

in_dict (dict) – The dictionary to conver
subslice (int or slice) – Used to slice the arrays

Returns:

out_dict – The converted dicionary

Return type:

dict

qp.dict_utils.check_keys(in_dicts)[source]¶

Check that the keys in all the in_dicts match

Raises KeyError if one does not match.

qp.dict_utils.concatenate_dicts(in_dicts)[source]¶

Create a new dict by concatenate each array in in_dicts

Parameters:: in_dicts (list) – The dictionaries to stack
Returns:: out_dict – The stacked dicionary
Return type:: dict

qp.dict_utils.check_array_shapes(in_dict, npdf)[source]¶

Check that all the arrays in in_dict match the number of pdfs

Raises ValueError if one does not match.

qp.dict_utils.compare_two_dicts(d1, d2)[source]¶

Check that all the items in d1 and d2 match

Returns:: match – True if they all match, False otherwise
Return type:: bool

qp.dict_utils.compare_dicts(in_dicts)[source]¶

Check that all the dicts in in_dicts match

Returns:: match – True if they all match, False otherwise
Return type:: bool

qp.plotting: Tools for PDF plotting¶

Functions to plot PDFs

qp.plotting.init_matplotlib()[source]¶: Initialize matplotlib parameters

qp.plotting.make_figure_axes(xlim, **kwargs)[source]¶

Build a figure and a set of figure axes to plot data on

Parameters:

xlim ((float, float)) – The x-axis limits of the plot
**kwargs – passed directly to the matplotlib plot function

Returns:

fig, axes

Return type:

The figure and axes

qp.plotting.get_axes_and_xlims(**kwargs)[source]¶: Get and return the axes and xlims from the kwargs

qp.plotting.plot_pdf_on_axes(axes, pdf, xvals, **kwargs)[source]¶

Plot a PDF on a set of axes, by evaluating it a set of points

Parameters:

axes (matplotlib.axes or None) – The axes we want to plot the data on
pdf (scipy.stats.rv_frozen) – The distribution we want to plot
xvals (np.array) – The locations we evaluate the PDF at for plotting
**kwargs – Keywords are passed to matplotlib

Returns:

axes

Return type:

The axes the data are plotted on

qp.plotting.plot_dist_pdf(pdf, **kwargs)[source]¶

Plot a PDF on a set of axes, using the axes limits

Parameters:

pdf (scipy.stats.rv_frozen) – The distribution we want to plot
axes (matplotlib.axes) – The axes to plot on
xlim ((float, float)) – The x-axis limits
npts (int) – The number of x-axis points
kwargs (remaining) – passed directly to the plot_pdf_on_axes plot function

Returns:

axes

Return type:

The axes the data are plotted on

qp.plotting.plot_pdf_quantiles_on_axes(axes, xvals, yvals, quantiles, **kwargs)[source]¶

Plot a PDF on a set of axes, by evaluating at the quantiles provided

Parameters:

axes (The axes we want to plot the data on) –
xvals (array_like) – Pdf xvalues
yvals (array_like) – Pdf yvalues
quantiles ((np.array, np.array)) – The quantiles that define the distribution pdf
**kwargs – passed directly to the matplotlib plot function
npoints (int) – Number of points to use in the plotting. Evenly spaced along the axis provided.

Returns:

axes

Return type:

The axes the data are plotted on

qp.plotting.plot_pdf_histogram_on_axes(axes, hist, **kwargs)[source]¶

Plot a PDF on a set of axes, by plotting the histogrammed data

Parameters:

axes – The axes we want to plot the data on
**kwargs – passed directly to the matplotlib plot function
npoints (int) – Number of points to use in the plotting. Evenly spaced along the axis provided.

Returns:

The axes the data are plotted on

Return type:

axes

qp.plotting.plot_pdf_samples_on_axes(axes, pdf, samples, **kwargs)[source]¶

Plot a PDF on a set of axes, by displaying a set of samples from the PDF

Parameters:

axes (The axes we want to plot the data on) –
pdf (scipy.stats.rv_frozen) – The distribution we want to plot
samples (np.array) – Points sampled from the PDF
**kwargs – passed directly to the matplotlib plot function

Returns:

axes

Return type:

The axes the data are plotted on

qp.plotting.plot_native(pdf, **kwargs)[source]¶: Utility function to plot a pdf in a format that is specific to that type of pdf

qp.plotting.plot(pdf, **kwargs)[source]¶: Utility function to plot a pdf in a format that is specific to that type of pdf

API Documentation for qp¶

Ensemble and Factory¶

Distribution types¶

Histogram based¶

Interpolation of a fixed grid¶

Interpolation of a non-fixed grid¶

Spline based¶

Quantile based¶

Gaussian mixture model based¶

scipy distributions¶

Quantification Metrics¶

Utility functions¶

qp.conversion_funcs¶

qp.utils: PDF evaluation and construction utility functions¶

Infrastructure and Core functionality¶

qp.pdf_gen: scipy.stats interface¶

qp.dict_utils tools for multi-level dictionary manipulation¶

qp.plotting: Tools for PDF plotting¶

Table of Contents

Related Topics

This Page