API Documentation for qp

qp provides a PDF class object, that builds on the scipy.stats distributions to provide various approximate forms. The package also contains some utils and metrics for quantifying the quality of these approximations.

Ensemble and Factory

Implemenation of an ensemble of distributions

class qp.ensemble.Ensemble(gen_func, data, ancil=None)[source]

An object comprised of many qp.PDF objects to efficiently perform operations on all of them

property gen_func

Return the function used to create the distribution object for this ensemble

property gen_class

Return the class used to generate distributions for this ensemble

property dist

Return the scipy.stats.rv_continuous object that generates distributions for this ensemble

property kwds

Return the kwds associated to the frozen object

property gen_obj

Return the scipy.stats.rv_continuous object that generates distributions for this ensemble

property frozen

Return the scipy.stats.rv_frozen object that encapsultes the distributions for this ensemble

property ndim

Return the number of dimensions of PDFs in this ensemble

property shape

Return the number of PDFs in this ensemble

property npdf

Return the number of PDFs in this ensemble

property ancil

Return the ancillary data dictionary

convert_to(to_class, **kwargs)[source]

Convert a distribution or ensemble

Parameters:
  • to_class (class) – Class to convert to

  • **kwargs – keyword arguments are passed to the output class constructor

  • method (str) – Optional argument to specify a non-default conversion algorithm

Returns:

ens – Ensemble of pdfs yype class_to using the data from this object

Return type:

qp.Ensemble

update(data, ancil=None)[source]

Update the frozen object

Parameters:

data (dict) – Dictionary with data used to construct the ensemble

update_objdata(data, ancil=None)[source]

Update the object data in the distribution

Parameters:

data (dict) – Dictionary with data used to construct the ensemble

metadata()[source]

Return the metadata for this ensemble

Returns:

metadata – The metadata

Return type:

dict

Notes

Metadata are elements that are the same for all the PDFs in the ensemble These include the name and version of the PDF generation class

objdata()[source]

Return the object data for this ensemble

Returns:

objdata – The object data

Return type:

dict

Notes

Object data are elements that differ for each PDFs in the ensemble

set_ancil(ancil)[source]

Set the ancillary data dict

Parameters:

ancil (dict) – The ancillary data dictionary

Notes

Raises IndexError if the length of the arrays in ancil does not match the number of PDFs in the Ensemble

add_to_ancil(to_add)[source]

Add additionaly columns to the ancillary data dict

Parameters:

to_add (dict) – The columns to add to the ancillary data dict

Notes

Raises IndexError if the length of the arrays in to_add does not match the number of PDFs in the Ensemble

This calls dict.update() so it will overwrite existing columns

append(other_ens)[source]

Append another other_ens to this one

Parameters:

other_ens (qp.Ensemble) – The other Ensemble

build_tables()[source]

Return dicts of numpy arrays for the meta data and object data for this ensemble

Returns:

  • meta (dict) – Table with the meta data

  • data (dict) – Table with the object data

mode(grid)[source]

return the mode of each ensemble PDF, evaluated on grid

Parameters:

new_grid (array-like) – Grid on which to evaluate PDF

Returns:

mode – The modes of the PDFs evaluated on new_grid

Return type:

array-like

Notes

Adding expand_dims to return an (N, 1) array to be consistent with mean, median, and other point estimates

gridded(grid)[source]

Build, cache are return the PDF values at grid points

Parameters:

grid (array-like) – The grid points

Returns:

gridded

Return type:

(grid, pdf_values)

Notes

This first comparse grid to the cached value, if they match it returns the cached value

write_to(filename)[source]

Save this ensemble to a file

Parameters:

filename (str) –

Notes

This will actually write two files, one for the metadata and one for the object data

This uses tables_io to write the data, so any filesuffix that works for tables_io will work here.

pdf(x)[source]

Evaluates the probablity density function for the whole ensemble

Parameters:

x (float or ndarray, float) – location(s) at which to do the evaluations

logpdf(x)[source]

Evaluates the log of the probablity density function for the whole ensemble

Parameters:

x (float or ndarray, float) – location(s) at which to do the evaluations

cdf(x)[source]

Evaluates the cumalative distribution function for the whole ensemble

Parameters:

x (float or ndarray, float) – location(s) at which to do the evaluations

logcdf(x)[source]

Evaluates the log of the cumalative distribution function for the whole ensemble

Parameters:

x (float or ndarray, float) – location(s) at which to do the evaluations

ppf(q)[source]

Evaluates all the PPF of the distribution

Parameters:

q (float or ndarray, float) – location(s) at which to do the evaluations

sf(q)[source]

Evaluates the survival fraction of the distribution

Parameters:

x (float or ndarray, float) –

  1. at which to evaluate the pdfs

logsf(q)[source]

Evaluates the log of the survival function of the distribution

Parameters:

q (float or ndarray, float) – location(s) at which to evaluate the pdfs

Returns:

Log of the survival function

Return type:

float or ndarray

isf(q)[source]

Evaluates the inverse of the survival fraction of the distribution

Parameters:

x (float or ndarray, float) –

  1. at which to evaluate the pdfs

rvs(size=None, random_state=None)[source]

Generate samples from this ensmeble

Parameters:

size (int) – number of samples to return

stats(moments='mv')[source]

Retrun the stats for this ensemble

Parameters:

moments (str) – Which moments to include

median()[source]

Return the medians for this ensemble

mean()[source]

Return the means for this ensemble

var()[source]

Return the variences for this ensemble

std()[source]

Return the standard deviations for this ensemble

moment(n)[source]

Return the nth moments for this ensemble

entropy()[source]

Return the entropy for this ensemble

interval(alpha)[source]

Return the intervals corresponding to a confidnce level of alpha for this ensemble

histogramize(bins)[source]

Computes integrated histogram bin values for all PDFs

Parameters:

bins (ndarray, float, optional) – Array of N+1 endpoints of N bins

Returns:

self.histogram – Array of pairs of arrays of lengths (N+1, N) containing endpoints of bins and values in bins

Return type:

ndarray, tuple, ndarray, floats

integrate(limits)[source]

Computes the integral under the ensemble of PDFs between the given limits.

Parameters:
  • limits (numpy.ndarray, tuple, float) – limits of integration, may be different for all PDFs in the ensemble

  • using (string) – parametrization over which to approximate the integral

  • dx (float, optional) – granularity of integral

Returns:

integral – value of the integral

Return type:

numpy.ndarray, float

mix_mod_fit(comps=5)[source]

Fits the parameters of a given functional form to an approximation

Parameters:
  • comps (int, optional) – number of components to consider

  • using (string, optional) – which existing approximation to use, defaults to first approximation

  • vb (boolean) – Report progress

Returns:

self.mix_mod – list of qp.Composite objects approximating the PDFs

Return type:

list, qp.Composite objects

Notes

Currently only supports mixture of Gaussians

moment_partial(n, limits, dx=0.01)[source]

Return the nth moments for this over a particular range

plot(key=0, **kwargs)[source]

Plot the pdf as a curve

Parameters:

key (int or slice) – Which PDF or PDFs from this ensemble to plot

plot_native(key=0, **kwargs)[source]

Plot the pdf as a curve

Parameters:

key (int or slice) – Which PDF or PDFs from this ensemble to plot

initializeHdf5Write(filename, npdf, comm=None)[source]

set up the output write for an ensemble, but set size to npdf rather than the size of the ensemble, as the “initial chunk” will not contain the full data

Parameters:
  • filename (str) – Name of the file to create

  • npdf (int) – Total number of pdfs that will contain the file, usually larger then the size of the current ensemble

  • comm (MPI communicator) – Optional MPI communicator to allow parallel writing

writeHdf5Chunk(fname, start, end)[source]

write ensemble data chunk to file

Parameters:
  • fname (h5py File object) – file or group

  • start (int) – starting index of h5py file

  • end (int) – ending index in h5py file

finalizeHdf5Write(filename)[source]

write ensemble metadata to the output file

Parameters:

filename (h5py File object) – file or group

This module implements a factory that manages different types of PDFs

class qp.factory.Factory[source]

Factory that creates and manages PDFs

add_class(the_class)[source]

Add a class to the factory

Parameters:

the_class (class) – The class we are adding, must inherit from Pdf_Gen

create(class_name, data, method=None)[source]

Make an ensemble of a particular type of distribution

Parameters:
  • class_name (str) – The name of the class to make

  • data (dict) – Values passed to class create function

  • method (str [None]) – Used to select which creation method to invoke

Returns:

ens – The newly created ensemble

Return type:

qp.Ensemble

from_tables(tables)[source]

Build this ensemble from a tables

Parameters:

tables (dict) –

Notes

This will use information in the meta data table to figure out how to construct the data need to build the ensemble.

read_metadata(filename)[source]

Read an ensemble’s metadata from a file, without loading the full data.

Parameters:

filename (str) –

is_qp_file(filename)[source]

Test if a file is a qp file

Parameters:

filename (str) – File to test

Returns:

value – True if the file is a qp file

Return type:

bool

read(filename)[source]

Read this ensemble from a file

Parameters:

filename (str) –

Notes

This will use information in the meta data to figure out how to construct the data need to build the ensemble.

data_length(filename)[source]

Get the size of data

Parameters:

filename (str) –

Returns:

nrows

Return type:

int

iterator(filename, chunk_size=100000, rank=0, parallel_size=1)[source]

Return an iterator for chunked read

Parameters:
  • filename (str) –

  • chunk_size (int) –

convert(in_dist, class_name, **kwds)[source]

Read an ensemble to a different repersenation

Parameters:
  • in_dist (qp.Ensemble) – Input distributions

  • class_name (str) – Representation to convert to

Returns:

ens – The ensemble we converted to

Return type:

qp.Ensemble

pretty_print(stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Print a level of the converstion dictionary in a human-readable format

Parameters:

stream (stream) – The stream to print to

static concatenate(ensembles)[source]

Concatanate a list of ensembles

Parameters:

ensembles (list) – The ensembles we are concatanating

Returns:

ens – The output

Return type:

qp.Ensemble

static write_dict(filename, ensemble_dict, **kwargs)[source]
static read_dict(filename)[source]

Assume that filename is an HDF5 file, containing multiple qp.Ensembles that have been stored at nparrays.

qp.factory.instance()[source]

Return the factory instance

qp.factory.add_class(the_class)

Add a class to the factory

Parameters:

the_class (class) – The class we are adding, must inherit from Pdf_Gen

qp.factory.create(class_name, data, method=None)

Make an ensemble of a particular type of distribution

Parameters:
  • class_name (str) – The name of the class to make

  • data (dict) – Values passed to class create function

  • method (str [None]) – Used to select which creation method to invoke

Returns:

ens – The newly created ensemble

Return type:

qp.Ensemble

qp.factory.read(filename)

Read this ensemble from a file

Parameters:

filename (str) –

Notes

This will use information in the meta data to figure out how to construct the data need to build the ensemble.

qp.factory.read_metadata(filename)

Read an ensemble’s metadata from a file, without loading the full data.

Parameters:

filename (str) –

qp.factory.iterator(filename, chunk_size=100000, rank=0, parallel_size=1)

Return an iterator for chunked read

Parameters:
  • filename (str) –

  • chunk_size (int) –

qp.factory.convert(in_dist, class_name, **kwds)

Read an ensemble to a different repersenation

Parameters:
  • in_dist (qp.Ensemble) – Input distributions

  • class_name (str) – Representation to convert to

Returns:

ens – The ensemble we converted to

Return type:

qp.Ensemble

qp.factory.concatenate(ensembles)

Concatanate a list of ensembles

Parameters:

ensembles (list) – The ensembles we are concatanating

Returns:

ens – The output

Return type:

qp.Ensemble

qp.factory.data_length(filename)

Get the size of data

Parameters:

filename (str) –

Returns:

nrows

Return type:

int

qp.factory.from_tables(tables)

Build this ensemble from a tables

Parameters:

tables (dict) –

Notes

This will use information in the meta data table to figure out how to construct the data need to build the ensemble.

qp.factory.is_qp_file(filename)

Test if a file is a qp file

Parameters:

filename (str) – File to test

Returns:

value – True if the file is a qp file

Return type:

bool

qp.factory.write_dict(filename, ensemble_dict, **kwargs)
qp.factory.read_dict(filename)

Assume that filename is an HDF5 file, containing multiple qp.Ensembles that have been stored at nparrays.

Distribution types

Histogram based

class qp.hist_gen(bins, pdfs, *args, **kwargs)[source]

Bases: Pdf_rows_gen

Histogram based distribution

Notes

This implements a PDF using a set of histogramed values.

The relevant data members are:

bins: n+1 bin edges (shared for all PDFs)

pdfs: (npdf, n) bin values

Inside a given bin the pdf() will return the pdf value. Outside the range bins[0], bins[-1] the pdf() will return 0.

Inside a given bin the cdf() will use a linear interpolation accross the bin Outside the range bins[0], bins[-1] the cdf() will return (0 or 1), respectively

The ppf() is computed by inverting the cdf(). ppf(0) will return bins[0] ppf(1) will return bins[-1]

name = 'hist'
version = 0
property bins

Return the histogram bin edges

property pdfs

Return the histogram bin values

custom_generic_moment(m)[source]

Compute the mth moment

classmethod get_allocation_kwds(npdf, **kwargs)[source]

Return kwds necessary to create ‘empty’ hdf5 file with npdf entries for iterative writeout

classmethod plot_native(pdf, **kwargs)[source]

Plot the PDF in a way that is particular to this type of distibution

For a histogram this shows the bin edges

classmethod add_mappings()[source]

Add this classes mappings to the conversion dictionary

classmethod make_test_data()[source]

Make data for unit tests

Interpolation of a fixed grid

class qp.interp_gen(xvals, yvals, *args, **kwargs)[source]

Bases: Pdf_rows_gen

Interpolator based distribution

Notes

This implements a PDF using a set of interpolated values.

This version use the same xvals for all the the PDFs, which allows for much faster evaluation, and reduces the memory usage by a factor of 2.

The relevant data members are:

xvals: (n) x values

yvals: (npdf, n) y values

Inside the range xvals[0], xvals[-1] tt simply takes a set of x and y values and uses scipy.interpolate.interp1d to build the PDF. Outside the range xvals[0], xvals[-1] the pdf() will return 0.

The cdf() is constructed by integrating analytically computing the cumulative sum at the xvals grid points and interpolating between them. This will give a slight discrepency with the true integral of the pdf(), bit is much, much faster to evaluate. Outside the range xvals[0], xvals[-1] the cdf() will return (0 or 1), respectively

The ppf() is computed by inverting the cdf(). ppf(0) will return xvals[0] ppf(1) will return xvals[-1]

name = 'interp'
version = 0
property xvals

Return the x-values used to do the interpolation

property yvals

Return the y-valus used to do the interpolation

custom_generic_moment(m)[source]

Compute the mth moment

classmethod get_allocation_kwds(npdf, **kwargs)[source]

Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.

Parameters:
  • npdf (int) – number of total PDFs that will be written out

  • kwargs (dict) – dictionary of kwargs needed to create the ensemble

classmethod plot_native(pdf, **kwargs)[source]

Plot the PDF in a way that is particular to this type of distibution

For a interpolated PDF this uses the interpolation points

classmethod add_mappings()[source]

Add this classes mappings to the conversion dictionary

classmethod make_test_data()[source]

Make data for unit tests

Interpolation of a non-fixed grid

class qp.interp_irregular_gen(xvals, yvals, *args, **kwargs)[source]

Bases: Pdf_rows_gen

Interpolator based distribution

Notes

This implements a PDF using a set of interpolated values.

This version use the different xvals for each the the PDFs, which allows for more precision.

The relevant data members are:

xvals: (npdf, n) x values

yvals: (npdf, n) y values

Inside the range xvals[:,0], xvals[:,-1] tt simply takes a set of x and y values and uses scipy.interpolate.interp1d to build the PDF. Outside the range xvals[:,0], xvals[:,-1] the pdf() will return 0.

The cdf() is constructed by integrating analytically computing the cumulative sum at the xvals grid points and interpolating between them. This will give a slight discrepency with the true integral of the pdf(), bit is much, much faster to evaluate. Outside the range xvals[:,0], xvals[:,-1] the cdf() will return (0 or 1), respectively

The ppf() is computed by inverting the cdf(). ppf(0) will return min(xvals) ppf(1) will return max(xvals)

name = 'interp_irregular'
version = 0
property xvals

Return the x-values used to do the interpolation

property yvals

Return the y-valus used to do the interpolation

classmethod get_allocation_kwds(npdf, **kwargs)[source]

Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.

Parameters:
  • npdf (int) – number of total PDFs that will be written out

  • kwargs (dict) – dictionary of kwargs needed to create the ensemble

classmethod plot_native(pdf, **kwargs)[source]

Plot the PDF in a way that is particular to this type of distibution

For a interpolated PDF this uses the interpolation points

classmethod add_mappings()[source]

Add this classes mappings to the conversion dictionary

classmethod make_test_data()[source]

Make data for unit tests

Spline based

class qp.spline_gen(*args, **kwargs)[source]

Bases: Pdf_rows_gen

Spline based distribution

Notes

This implements PDFs using a set of splines

The relevant data members are:

splx: (npdf, n) spline-knot x-values

sply: (npdf, n) spline-knot y-values

spln: (npdf) spline-knot order paramters

The pdf() for the ith pdf will return the result of scipy.interpolate.splev(x, splx[i], sply[i], spln[i))

The cdf() for the ith pdf will return the result of scipy.interpolate.splint(x, splx[i], sply[i], spln[i))

The ppf() will use the default scipy implementation, which inverts the cdf() as evaluated on an adaptive grid.

name = 'spline'
version = 0
static build_normed_splines(xvals, yvals, **kwargs)[source]

Build a set of normalized splines using the x and y values

Parameters:
  • xvals (array_like) – The x-values used to do the interpolation

  • yvals (array_like) – The y-values used to do the interpolation

Returns:

  • splx (array_like) – The x-values of the spline knots

  • sply (array_like) – The y-values of the spline knots

  • spln (array_like) – The order of the spline knots

classmethod create_from_xy_vals(xvals, yvals, **kwargs)[source]

Create a new distribution using the given x and y values

Parameters:
  • xvals (array_like) – The x-values used to do the interpolation

  • yvals (array_like) – The y-values used to do the interpolation

Returns:

pdf_obj – The requested PDF

Return type:

spline_gen

classmethod create_from_samples(xvals, samples, **kwargs)[source]

Create a new distribution using the given x and y values

Parameters:
  • xvals (array_like) – The x-values used to do the interpolation

  • samples (array_like) – The sample values used to build the KDE

Returns:

pdf_obj – The requested PDF

Return type:

spline_gen

property splx

Return x-values of the spline knots

property sply

Return y-values of the spline knots

property spln

Return order of the spline knots

classmethod get_allocation_kwds(npdf, **kwargs)[source]

Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.

Parameters:
  • npdf (int) – number of total PDFs that will be written out

  • kwargs (dict) – dictionary of kwargs needed to create the ensemble

classmethod plot_native(pdf, **kwargs)[source]

Plot the PDF in a way that is particular to this type of distibution

For a spline this shows the spline knots

classmethod add_mappings()[source]

Add this classes mappings to the conversion dictionary

classmethod make_test_data()[source]

Make data for unit tests

Quantile based

class qp.quant_gen(quants, locs, *args, **kwargs)[source]

Bases: Pdf_rows_gen

Quantile based distribution, where the PDF is defined piecewise from the quantiles

Notes

This implements a CDF by interpolating a set of quantile values

It simply takes a set of x and y values and uses scipy.interpolate.interp1d to build the CDF

name = 'quant'
version = 0
property quants

Return quantiles used to build the CDF

property locs

Return the locations at which those quantiles are reached

property pdf_constructor_name

Returns the name of the current pdf constructor. Matches a key in the PDF_CONSTRUCTORS dictionary.

property pdf_constructor: AbstractQuantilePdfConstructor

Returns the current PDF constructor, and allows the user to interact with its methods.

Returns:

Abstract base class of the active concrete PDF constructor.

Return type:

AbstractQuantilePdfConstructor

classmethod get_allocation_kwds(npdf, **kwargs)[source]

Return kwds necessary to create ‘empty’ hdf5 file with npdf entries for iterative writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.

classmethod plot_native(pdf, **kwargs)[source]

Plot the PDF in a way that is particular to this type of distibution

For a quantile this shows the quantiles points

classmethod add_mappings()[source]

Add this classes mappings to the conversion dictionary

Gaussian mixture model based

class qp.mixmod_gen(means, stds, weights, *args, **kwargs)[source]

Bases: Pdf_rows_gen

Mixture model based distribution

Notes

This implements a PDF using a Gaussian Mixture model

The relevant data members are:

means: (npdf, ncomp) means of the Gaussians stds: (npdf, ncomp) standard deviations of the Gaussians weights: (npdf, ncomp) weights for the Gaussians

The pdf() and cdf() are exact, and are computed as a weighted sum of the pdf() and cdf() of the component Gaussians.

The ppf() is computed by computing the cdf() values on a fixed grid and interpolating the inverse function.

name = 'mixmod'
version = 0
property weights

Return weights to attach to the Gaussians

property means

Return means of the Gaussians

property stds

Return standard deviations of the Gaussians

classmethod get_allocation_kwds(npdf, **kwargs)[source]

Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.

Parameters:
  • npdf (int) – number of total PDFs that will be written out

  • kwargs (dict) – dictionary of kwargs needed to create the ensemble

classmethod add_mappings()[source]

Add this classes mappings to the conversion dictionary

classmethod make_test_data()[source]

Make data for unit tests

scipy distributions

Module to define qp distributions that inherit from scipy distributions

Notes

In the qp distribtuions the last axis in the input array shapes is reserved for pdf parameters.

This is because qp deals with numerical representations of distributions, where some of the input parameters consist of arrays of values for each pdf.

scipy.stats assumes that all input parameters scalars for each pdf.

To ensure that scipy.stats based distributions behave the same as qp distributions we are going to insure that the all input variables have shape either (npdf, 1) or (1)

Quantification Metrics

This module implements some performance metrics for distribution parameterization

class qp.metrics.metrics.Grid(grid_values, cardinality, resolution, hist_bin_edges, limits)
cardinality

Alias for field number 1

grid_values

Alias for field number 0

hist_bin_edges

Alias for field number 3

limits

Alias for field number 4

resolution

Alias for field number 2

qp.metrics.metrics.calculate_moment(p, N, limits, dx=0.01)[source]

Calculates a moment of a qp.Ensemble object

Parameters:
  • p (qp.Ensemble object) – the collection of PDFs whose moment will be calculated

  • N (int) – order of the moment to be calculated

  • limits (tuple of floats) – endpoints of integration interval over which to calculate moments

  • dx (float) – resolution of integration grid

Returns:

M – value of the moment

Return type:

float

qp.metrics.metrics.calculate_kld(p, q, limits, dx=0.01)[source]

Calculates the Kullback-Leibler Divergence between two qp.Ensemble objects.

Parameters:
  • p (Ensemble object) – probability distribution closer to the truth

  • q (Ensemble object) – probability distribution that approximates p

  • limits (tuple of floats) – endpoints of integration interval in which to calculate KLD

  • dx (float) – resolution of integration grid

Returns:

Dpq – the value of the Kullback-Leibler Divergence from q to p

Return type:

float

Notes

TO DO: have this take number of points not dx!

qp.metrics.metrics.calculate_rmse(p, q, limits, dx=0.01)[source]

Calculates the Root Mean Square Error between two qp.Ensemble objects.

Parameters:
  • p (qp.Ensemble object) – probability distribution function whose distance between its truth and the approximation of q will be calculated.

  • q (qp.Ensemble object) – probability distribution function whose distance between its approximation and the truth of p will be calculated.

  • limits (tuple of floats) – endpoints of integration interval in which to calculate RMS

  • dx (float) – resolution of integration grid

Returns:

rms – the value of the RMS error between q and p

Return type:

float

Notes

TO DO: change dx to N

qp.metrics.metrics.calculate_rbpe(p, limits=(inf, inf))[source]

Calculates the risk based point estimates of a qp.Ensemble object. Algorithm as defined in 4.2 of ‘Photometric redshifts for Hyper Suprime-Cam Subaru Strategic Program Data Release 1’ (Tanaka et al. 2018).

Parameters:
  • p (qp.Ensemble object) – Ensemble of PDFs to be evalutated

  • limits (tuple) – The limits at which to evaluate possible z_best estimates. If custom limits are not provided then all potential z value will be considered using the scipy.optimize.minimize_scalar function.

Returns:

rbpes – The risk based point estimates of the provided ensemble.

Return type:

array of floats

qp.metrics.metrics.calculate_brier(p, truth, limits, dx=0.01)[source]

This function will do the following:

  1. Generate a Mx1 sized grid based on limits and dx.

  2. Produce an NxM array by evaluating the pdf for each of the N distribution objects in the Ensemble p on the grid.

  3. Produce an NxM truth_array using the input truth and the generated grid. All values will be 0 or 1.

  4. Create a Brier metric evaluation object

  5. Return the result of the Brier metric calculation.

Parameters:
  • p (qp.Ensemble object) – of N distributions probability distribution functions that will be gridded and compared against truth.

  • truth (Nx1 sequence) – the list of true values, 1 per distribution in p.

  • limits (2-tuple of floats) – endpoints grid to evaluate the PDFs for the distributions in p

  • dx (float) – resolution of the grid Defaults to 0.01.

Returns:

Brier_metric

Return type:

float

qp.metrics.metrics.calculate_brier_for_accumulation(p, truth, limits, dx=0.01)[source]
qp.metrics.metrics.calculate_anderson_darling(p, scipy_distribution='norm', num_samples=100, _random_state=None)[source]

This function is deprecated and will be completely removed in a later version. Please use calculate_goodness_of_fit instead.

Return type:

logger.warning

qp.metrics.metrics.calculate_cramer_von_mises(p, q, num_samples=100, _random_state=None, **kwargs)[source]

This function is deprecated and will be completely removed in a later version. Please use calculate_goodness_of_fit instead.

Return type:

logger.warning

qp.metrics.metrics.calculate_kolmogorov_smirnov(p, q, num_samples=100, _random_state=None)[source]

This function is deprecated and will be completely removed in a later version. Please use calculate_goodness_of_fit instead.

Return type:

logger.warning

qp.metrics.metrics.calculate_outlier_rate(p, lower_limit=0.0001, upper_limit=0.9999)[source]

Fraction of outliers in each distribution

Parameters:
  • p (qp.Ensemble) – A collection of N distributions. This implementation expects that Ensembles are not nested.

  • lower_limit (float, optional) – Lower bound CDF for outliers, by default 0.0001

  • upper_limit (float, optional) – Upper bound CDF for outliers, by default 0.9999

Returns:

1xN array where each element is the percent of outliers for a distribution in the Ensemble.

Return type:

[float]

qp.metrics.metrics.calculate_goodness_of_fit(estimate, reference, fit_metric='ks', num_samples=100, _random_state=None)[source]

This method calculates goodness of fit between the distributions in the estimate and reference Ensembles using the specified fit_metric.

Parameters:
  • estimate (Ensemble containing N distributions) – Random variate samples will be drawn from this Ensemble

  • reference (Ensemble containing N or 1 distributions) – The CDF of the distributions in this Ensemble are used in the goodness of fit calculation.

  • fit_metric (string, optional) – The goodness of fit metric to use. One of [‘ad’, ‘cvm’, ‘ks’]. For clarity, ‘ad’ = Anderson-Darling, ‘cvm’ = Cramer-von Mises, and ‘ks’ = Kolmogorov-Smirnov, by default ‘ks’

  • num_samples (int, optional) – Number of random variates to draw from each distribution in estimate, by default 100

  • _random_state (_type_, optional) – Used for testing to create reproducible sets of random variates, by default None

Returns:

output – A array of floats where each element is the result of the statistic calculation.

Return type:

[float]

Raises:

KeyError – If the requested fit_metric is not contained in goodness_of_fit_metrics dictionary, raise a KeyError.

Notes

The calculation of the goodness of fit metrics is not symmetric. i.e. calculate_goodness_of_fit(p, q, …) != calculate_goodness_of_fit(q, p, …)

In the future, we should be able to do this directly from the PDFs without needing to take random variates from the estimate Ensemble.

The vectorized implementations of fit metrics are copied over (unmodified) from the developer branch of Scipy 1.10.0dev. When Scipy 1.10 is released, we can replace the copied implementation with the ones in Scipy.

This module implements metric calculations that are independent of qp.Ensembles

qp.metrics.array_metrics.quick_anderson_ksamp(p_random_variables, q_random_variables, **kwargs)[source]

Calculate the k-sample Anderson-Darling statistic using scipy.stats.anderson_ksamp for two CDFs. For more details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html

Parameters:
  • p_random_variables (np.array) – An array of random variables from the given distribution

  • q_random_variables (np.array) – An array of random variables from the given distribution

Returns:

A array of objects with attributes statistic, critical_values, and significance_level.

Return type:

[Result objects]

qp.metrics.array_metrics.quick_kld(p_eval, q_eval, dx=0.01)[source]

Calculates the Kullback-Leibler Divergence between two evaluations of PDFs.

Parameters:
  • p_eval (numpy.ndarray, float) – evaluations of probability distribution closer to the truth

  • q_eval (numpy.ndarray, float) – evaluations of probability distribution that approximates p

  • dx (float) – resolution of integration grid

Returns:

Dpq – the value of the Kullback-Leibler Divergence from q to p

Return type:

float

qp.metrics.array_metrics.quick_moment(p_eval, grid_to_N, dx)[source]

Calculates a moment of an evaluated PDF

Parameters:
  • p_eval (numpy.ndarray, float) – the values of a probability distribution

  • grid (numpy.ndarray, float) – the grid upon which p_eval was evaluated

  • dx (float) – the difference between regular grid points

  • N (int) – order of the moment to be calculated

Returns:

M – value of the moment

Return type:

float

qp.metrics.array_metrics.quick_rmse(p_eval, q_eval, N)[source]

Calculates the Root Mean Square Error between two evaluations of PDFs.

Parameters:
  • p_eval (numpy.ndarray, float) – evaluation of probability distribution function whose distance between its truth and the approximation of q will be calculated.

  • q_eval (numpy.ndarray, float) – evaluation of probability distribution function whose distance between its approximation and the truth of p will be calculated.

  • N (int) – number of points at which PDFs were evaluated

Returns:

rms – the value of the RMS error between q and p

Return type:

float

qp.metrics.array_metrics.quick_rbpe(pdf_function, integration_bounds, limits=(inf, inf))[source]

Calculates the risk based point estimate of a qp.Ensemble object with npdf == 1.

Parameters:
  • pdf_function – The function should calculate the value of a pdf at a given x value

  • function (python) – The function should calculate the value of a pdf at a given x value

  • integration_bounds – The integration bounds - typically (ppf(0.01), ppf(0.99)) for the given distribution

  • floats (tuple of) – The integration bounds - typically (ppf(0.01), ppf(0.99)) for the given distribution

  • limits – The limits at which to evaluate possible z_best estimates. If custom limits are not provided then all potential z value will be considered using the scipy.optimize.minimize_scalar function.

  • floats – The limits at which to evaluate possible z_best estimates. If custom limits are not provided then all potential z value will be considered using the scipy.optimize.minimize_scalar function.

Returns:

rbpe – The risk based point estimate of the provided ensemble.

Return type:

float

class qp.metrics.brier.Brier(prediction, truth)[source]

Brier score based on https://en.wikipedia.org/wiki/Brier_score#Original_definition_by_Brier

Parameters:
  • prediction (NxM array, float) – Predicted probability for N distributions to have a true value in one of M bins. The sum of values along each row N should be 1.

  • truth (NxM array, int) – True values for N distributions, where Mth bin for the true value will have value 1, all other bins will have a value of 0.

evaluate()[source]

Evaluate the Brier score.

Returns:

The result of calculating the Brier metric, a value in the interval [0,2]

Return type:

float

class qp.metrics.pit.PIT(qp_ens, true_vals, eval_grid=DEFAULT_QUANTS)[source]

Probability Integral Transform

Parameters:
  • qp_ens (Ensemble) – A collection of N distribution objects

  • true_vals ([float]) – An array-like sequence of N float values representing the known true value for each distribution

  • eval_grid ([float], optional) – A strictly increasing array-like sequence in the range [0,1], by default DEFAULT_QUANTS

Returns:

An object with an Ensemble containing the PIT distribution, and a full set of PIT samples.

Return type:

PIT object

property pit_samps

Returns the PIT samples. i.e. CDF(true_vals) for each distribution in the Ensemble used to initialize the PIT object.

Returns:

An array of floats

Return type:

np.array

property pit

Return the PIT Ensemble object

Returns:

An Ensemble containing 1 qp.quant distribution.

Return type:

qp.Ensemble

calculate_pit_meta_metrics()[source]

Convenience method that will calculate all of the PIT meta metrics and return them as a dictionary.

Returns:

The collection of PIT statistics

Return type:

dictionary

evaluate_PIT_anderson_ksamp(pit_min=0.0, pit_max=1.0)[source]

Use scipy.stats.anderson_ksamp to compute the Anderson-Darling statistic for the cdf(truth) values by comparing with a uniform distribution between 0 and 1. Up to the current version (1.9.3), scipy.stats.anderson does not support uniform distributions as reference for 1-sample test, therefore we create a uniform “distribution” and pass it as the second value in the list of parameters to the scipy implementation of k-sample Anderson-Darling. For details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html

Parameters:
  • pit_min (float, optional) – Minimum PIT value to accept, by default 0.

  • pit_max (float, optional) – Maximum PIT value to accept, by default 1.

Returns:

A array of objects with attributes statistic, critical_values, and significance_level. For details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html

Return type:

array

evaluate_PIT_CvM()[source]

Calculate the Cramer von Mises statistic using scipy.stats.cramervonmises using self._pit_samps compared to a uniform distribution. For more details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.cramervonmises.html

Returns:

A array of objects with attributes statistic and pvalue For details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.cramervonmises.html

Return type:

array

evaluate_PIT_KS()[source]

Calculate the Kolmogorov-Smirnov statistic using scipy.stats.kstest. For more details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html

Returns:

A array of objects with attributes statistic and pvalue. For details see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html

Return type:

array

evaluate_PIT_outlier_rate(pit_min=0.0001, pit_max=0.9999)[source]

Compute fraction of PIT outliers by evaluating the CDF of the distribution in the PIT Ensemble at pit_min and pit_max.

Parameters:
  • pit_min (float, optional) – Lower bound for outliers, by default 0.0001

  • pit_max (float, optional) – Upper bound for outliers, by default 0.9999

Returns:

The percentage of outliers in this distribution given the min and max bounds.

Return type:

float

Utility functions

qp.conversion_funcs

This module implements functions to convert distributions between various representations These functions should then be registered with the qp.ConversionDict using qp_add_mapping. That will allow the automated conversion mechanisms to work.

qp.conversion_funcs.extract_vals_at_x(in_dist, **kwargs)[source]

Convert using a set of x and y values.

Parameters:
  • in_dist (qp.Ensemble) – Input distributions

  • xvals (np.array) – Locations at which the pdf is evaluated

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_xy_vals(in_dist, **kwargs)[source]

Convert using a set of x and y values.

Parameters:
  • in_dist (qp.Ensemble) – Input distributions

  • xvals (np.array) – Locations at which the pdf is evaluated

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_samples(in_dist, **kwargs)[source]

Convert using a set of values sampled from the PDF

Parameters:
  • in_dist (qp.Ensemble) – Input distributions

  • size (int) – Number of samples to generate

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_hist_values(in_dist, **kwargs)[source]

Convert using a set of values sampled from the PDF

Parameters:
  • in_dist (qp.Ensemble) – Input distributions

  • bins (np.array) – Histogram bin edges

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_hist_samples(in_dist, **kwargs)[source]

Convert using a set of values samples that are then histogramed

Parameters:
  • in_dist (qp.Ensemble) – Input distributions

  • bins (np.array) – Histogram bin edges

  • size (int) – Number of samples to generate

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_quantiles(in_dist, **kwargs)[source]

Convert using a set of quantiles and the locations at which they are reached

Parameters:
  • in_dist (qp.Ensemble) – Input distributions

  • quantiles (np.array) – Quantile values to use

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_fit(in_dist, **kwargs)[source]

Convert to a functional distribution by fitting it to a set of x and y values

Parameters:
  • in_dist (qp.Ensemble) – Input distributions

  • xvals (np.array) – Locations at which the pdf is evaluated

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_mixmod_fit_samples(in_dist, **kwargs)[source]

Convert to a mixture model using a set of values sample from the pdf

Parameters:
  • in_dist (qp.Ensemble) – Input distributions

  • ncomps (int) – Number of components in mixture model to use

  • nsamples (int) – Number of samples to generate

  • random_state (int) – Used to reproducibly generate random variate from in_dist

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_voigt_mixmod(in_dist, **kwargs)[source]

Convert to a voigt mixture model starting with a gaussian mixture model, trivially by setting gammas to 0

Parameters:

in_dist (qp.Ensemble) – Input distributions

Returns:

data – The extracted data

Return type:

dict

qp.conversion_funcs.extract_voigt_xy(in_dist, **kwargs)[source]

Build a voigt function basis and run a match-pursuit algorithm to fit gridded data

Parameters:

in_dist (qp.Ensemble) – Input distributions

Returns:

data – The extracted data as sparse indices, basis, and metadata to rebuild the basis

Return type:

dict

qp.conversion_funcs.extract_voigt_xy_sparse(in_dist, **kwargs)[source]

Build a voigt function basis and run a match-pursuit algorithm to fit gridded data

Parameters:

in_dist (qp.Ensemble) – Input distributions

Returns:

data – The extracted data as shaped parameters means, stds, weights, gammas

Return type:

dict

qp.conversion_funcs.extract_sparse_from_xy(in_dist, **kwargs)[source]

Extract sparse representation from an xy interpolated representation

Parameters:
  • in_dist (qp.Ensemble) – Input distributions

  • xvals (array-like) – Used to override the y-values

  • xvals – Used to override the x-values

  • nvals (int) – Used to override the number of bins

Returns:

metadata – Dictionary with data for sparse representation

Return type:

dict

Notes

This function will rebin to a grid more suited to the in_dist support by removing x-values corrsponding to y=0

qp.conversion_funcs.extract_xy_sparse(in_dist, **kwargs)[source]

Extract xy-interpolated representation from an sparese representation

Parameters:
  • in_dist (qp.Ensemble) – Input distributions

  • xvals (array-like) – Used to override the y-values

  • xvals – Used to override the x-values

  • nvals (int) – Used to override the number of bins

Returns:

metadata – Dictionary with data for interpolated representation

Return type:

dict

Notes

This function will rebin to a grid more suited to the in_dist support by removing x-values corrsponding to y=0

qp.utils: PDF evaluation and construction utility functions

Utility functions for the qp package

qp.utils.safelog(arr, threshold=2.220446049250313e-16)[source]

Takes the natural logarithm of an array of potentially non-positive numbers

Parameters:
  • arr (numpy.ndarray, float) – values to be logged

  • threshold (float) – small, positive value to replace zeros and negative numbers

Returns:

logged – logarithms, with approximation in place of zeros and negative numbers

Return type:

numpy.ndarray

qp.utils.edge_to_center(edges)[source]

Return the centers of a set of bins given the edges

qp.utils.bin_widths(edges)[source]

Return the widths of a set of bins given the edges

qp.utils.get_bin_indices(bins, x)[source]

Return the bin indexes for a set of values

If the bins are equal width this will use arithmatic, If the bins are not equal width this will use a binary search

qp.utils.normalize_interp1d(xvals, yvals)[source]

Normalize a set of 1D interpolators

Parameters:
  • xvals (array-like) – X-values used for the interpolation

  • yvals (array-like) – Y-values used for the interpolation

Returns:

ynorm – Normalized y-vals

Return type:

array-like

qp.utils.build_kdes(samples, **kwargs)[source]

Build a set of Gaussian Kernal Density Estimates

Parameters:
  • samples (array-like) – X-values used for the spline

  • Keywords

  • --------

  • constructor (Passed to the scipy.stats.gaussian_kde) –

Returns:

kdes

Return type:

list of scipy.stats.gaussian_kde objects

qp.utils.evaluate_kdes(xvals, kdes)[source]

Build a evaluate a set of kdes

Parameters:
  • xvals (array_like) – X-values used for the spline

  • kdes (list of sps.gaussian_kde) – The kernel density estimates

Returns:

yvals – The kdes evaluated at the xvamls

Return type:

array_like

qp.utils.get_eval_case(x, row)[source]

Figure out which of the various input formats scipy.stats has passed us

Parameters:
  • x (array_like) – Pdf x-vals

  • row (array_like) – Pdf row indices

Returns:

  • case (int) – The case code

  • xx (array_like) – The x-values properly shaped

  • rr (array_like) – The y-values, properly shaped

Notes

The cases are:

CASE_FLAT : x, row have shapes (n), (n) and do not factor CASE_FACTOR : x, row have shapes (n), (n) but can be factored to shapes (1, nx) and (npdf, 1)

(i.e., they were flattend by scipy)

CASE_PRODUCT : x, row have shapes (1, nx) and (npdf, 1) CASE_2D : x, row have shapes (npdf, nx) and (npdf, nx)

qp.utils.evaluate_hist_x_multi_y_flat(x, row, bins, vals, derivs=None)[source]

Evaluate a set of values from histograms

Parameters:
  • x (array_like (n)) – X values to interpolate at

  • row (array_like (n)) – Which rows to interpolate at

  • bins (array_like (N+1)) – ‘x’ bin edges

  • vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like (n)

qp.utils.evaluate_hist_x_multi_y_product(x, row, bins, vals, derivs=None)[source]

Evaluate a set of values from histograms

Parameters:
  • x (array_like (npts)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • bins (array_like (N+1)) – ‘x’ bin edges

  • vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like (npdf, npts)

qp.utils.evaluate_hist_x_multi_y_2d(x, row, bins, vals, derivs=None)[source]

Evaluate a set of values from histograms

Parameters:
  • x (array_like (npdf, npts)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • bins (array_like (N+1)) – ‘x’ bin edges

  • vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like (npdf, npts)

qp.utils.evaluate_hist_x_multi_y(x, row, bins, vals, derivs=None)[source]

Evaluate a set of values from histograms

Parameters:
  • x (array_like) – X values to interpolate at

  • row (array_like) – Which rows to interpolate at

  • bins (array_like (N+1)) – ‘x’ bin edges

  • vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like

Notes

Depending on the shape of ‘x’ and ‘row’ this will use one of the three specific implementations.

qp.utils.evaluate_hist_multi_x_multi_y_flat(x, row, bins, vals, derivs=None)[source]

Evaluate a set of values from histograms

Parameters:
  • x (array_like (n)) – X values to interpolate at

  • row (array_like (n)) – Which rows to interpolate at

  • bins (array_like (npdf, N+1)) – ‘x’ bin edges

  • vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like (n)

qp.utils.evaluate_hist_multi_x_multi_y_product(x, row, bins, vals, derivs=None)[source]

Evaluate a set of values from histograms

Parameters:
  • x (array_like (npts)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • bins (array_like (npdf, N+1)) – ‘x’ bin edges

  • vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like (npdf, npts)

qp.utils.evaluate_hist_multi_x_multi_y_2d(x, row, bins, vals, derivs=None)[source]

Evaluate a set of values from histograms

Parameters:
  • x (array_like (npdf, npts)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • bins (array_like (npdf, N+1)) – ‘x’ bin edges

  • vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like (npdf, npts)

qp.utils.evaluate_hist_multi_x_multi_y(x, row, bins, vals, derivs=None)[source]

Evaluate a set of values from histograms

Parameters:
  • x (array_like) – X values to interpolate at

  • row (array_like) – Which rows to interpolate at

  • bins (array_like (npdf, N+1)) – ‘x’ bin edges

  • vals (array_like (npdf, N)) – ‘y’ bin contents

Returns:

out – The histogram values

Return type:

array_like

qp.utils.interpolate_x_multi_y_flat(x, row, xvals, yvals, **kwargs)[source]

Interpolate a set of values

Parameters:
  • x (array_like (n)) – X values to interpolate at

  • row (array_like (n)) – Which rows to interpolate at

  • xvals (array_like (npts)) – X-values used for the interpolation

  • yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_x_multi_y_product(x, row, xvals, yvals, **kwargs)[source]

Interpolate a set of values

Parameters:
  • x (array_like (n)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • xvals (array_like (npts)) – X-values used for the interpolation

  • yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_x_multi_y_2d(x, row, xvals, yvals, **kwargs)[source]

Interpolate a set of values

Parameters:
  • x (array_like (npdf, n)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • xvals (array_like (npts)) – X-values used for the interpolation

  • yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_x_multi_y(x, row, xvals, yvals, **kwargs)[source]

Interpolate a set of values

Parameters:
  • x (array_like (npdf, n)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • xvals (array_like (npts)) – X-values used for the interpolation

  • yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like

qp.utils.interpolate_multi_x_multi_y_flat(x, row, xvals, yvals, **kwargs)[source]

Interpolate a set of values

Parameters:
  • x (array_like (n)) – X values to interpolate at

  • row (array_like (n)) – Which rows to interpolate at

  • xvals (array_like (npdf, npts)) – X-values used for the interpolation

  • yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_multi_x_multi_y_product(x, row, xvals, yvals, **kwargs)[source]

Interpolate a set of values

Parameters:
  • x (array_like (n)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • xvals (array_like (npdf, npts)) – X-values used for the interpolation

  • yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_multi_x_multi_y_2d(x, row, xvals, yvals, **kwargs)[source]

Interpolate a set of values

Parameters:
  • x (array_like (npdf, n)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • xvals (array_like (npdf, npts)) – X-values used for the interpolation

  • yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_multi_x_multi_y(x, row, xvals, yvals, **kwargs)[source]

Interpolate a set of values

Parameters:
  • x (array_like (npdf, n)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • xvals (array_like (npdf, npts)) – X-values used for the interpolation

  • yvals (array_like (npdf, npts)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like

qp.utils.interpolate_multi_x_y_flat(x, row, xvals, yvals, **kwargs)[source]

Interpolate a set of values

Parameters:
  • x (array_like (n)) – X values to interpolate at

  • row (array_like (n)) – Which rows to interpolate at

  • xvals (array_like (npdf, npts)) – X-values used for the interpolation

  • yvals (array_like (npdf)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_multi_x_y_product(x, row, xvals, yvals, **kwargs)[source]

Interpolate a set of values

Parameters:
  • x (array_like (n)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • xvals (array_like (npdf, npts)) – X-values used for the interpolation

  • yvals (array_like (npdf)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_multi_x_y_2d(x, row, xvals, yvals, **kwargs)[source]

Interpolate a set of values

Parameters:
  • x (array_like (npdf, n)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • xvals (array_like (npdf, npts)) – X-values used for the interpolation

  • yvals (array_like (npdf)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like (npdf, n)

qp.utils.interpolate_multi_x_y(x, row, xvals, yvals, **kwargs)[source]

Interpolate a set of values

Parameters:
  • x (array_like (npdf, n)) – X values to interpolate at

  • row (array_like (npdf, 1)) – Which rows to interpolate at

  • xvals (array_like (npdf, npts)) – X-values used for the interpolation

  • yvals (array_like (npdf)) – Y-avlues used for the inteolation

Returns:

vals – The interpoalted values

Return type:

array_like

qp.utils.profile(x_data, y_data, x_bins, std=True)[source]

Make a ‘profile’ plot

Parameters:
  • x_data (array_like (n)) – The x-values

  • y_data (array_like (n)) – The y-values

  • x_bins (array_like (nbins+1)) – The values of the bin edges

  • std (bool) – If true, return the standard deviations, if false return the errors on the means

Returns:

  • vals (array_like (nbins)) – The means

  • errs (array_like (nbins)) – The standard deviations or errors on the means

qp.utils.reshape_to_pdf_size(vals, split_dim)[source]

Reshape an array to match the number of PDFs in a distribution

Parameters:
  • vals (array) – The input array

  • split_dim (int) – The dimension at which to split between pdf indices and per_pdf indices

Returns:

out – The reshaped array

Return type:

array

qp.utils.reshape_to_pdf_shape(vals, pdf_shape, per_pdf)[source]

Reshape an array to match the shape of PDFs in a distribution

Parameters:
  • vals (array) – The input array

  • pdf_shape (int) – The shape for the pdfs

  • per_pdf (int or array_like) – The shape per pdf

Returns:

out – The reshaped array

Return type:

array

Infrastructure and Core functionality

qp.pdf_gen: scipy.stats interface

This module implements continous distributions generators that inherit from the scipy.stats.rv_continuous class

If you would like to add a sub-class, please read the instructions on subclassing here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html

Open questions: 1) At this time the normalization is not enforced for many of the PDF types. It is assumed that the user values give correct normalization. We should think about this more.

2) At this time for most of the distributions, only the _pdf function is overridden. This is all that is required to inherit from scipy.stats.rv_continuous; however, providing implementations of some of _logpdf, _cdf, _logcdf, _ppf, _rvs, _isf, _sf, _logsf could speed the code up a lot in some cases.

class qp.pdf_gen.Pdf_gen(*args, **kwargs)[source]

Interface class to extend scipy.stats.rv_continuous with information needed for qp

Notes

Metadata are elements that are the same for all the PDFs These include the name and version of the PDF generation class, and possible data such as the bin edges used for histogram representations

Object data are elements that differ for each PDFs

property metadata

Return the metadata for this set of PDFs

property objdata

Return the object data for this set of PDFs

classmethod creation_method(method=None)[source]

Return the method used to create a PDF of this type

classmethod extraction_method(method=None)[source]

Return the method used to extract data to create a PDF of this type

classmethod reader_method(version=None)[source]

Return the method used to convert data read from a file PDF of this type

classmethod add_method_dicts()[source]

Add empty method dicts

classmethod print_method_maps(stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Print the maps showing the methods

classmethod create_gen(**kwds)[source]

Create and return a scipy.stats.rv_continuous object using the keyword arguemntets provided

classmethod create(**kwds)[source]

Create and return a scipy.stats.rv_frozen object using the keyword arguemntets provided

classmethod plot(pdf, **kwargs)[source]

Plot the pdf as a curve

classmethod plot_native(pdf, **kwargs)[source]

Plot the PDF in a way that is particular to this type of distibution

This defaults to plotting it as a curve, but this can be overwritten

classmethod get_allocation_kwds(npdf, **kwargs)[source]

Return kwds necessary to create ‘empty’ hdf5 file with npdf entries for iterative writeout

class qp.pdf_gen.rv_frozen_func(dist, *args, **kwds)[source]

Trivial extention of scipy.stats.rv_frozen that includes the number of PDFs it represents

property ndim

Return the number of dimensions of PDFs in this ensemble

property shape

Return the shape of the set of PDFs this object represents

property npdf

Return the number of PDFs this object represents

histogramize(bins)[source]

Computes integrated histogram bin values for all PDFs

Parameters:

bins (ndarray, float, optional) – Array of N+1 endpoints of N bins

Returns:

self.histogram – Array of pairs of arrays of lengths (N+1, N) containing endpoints of bins and values in bins

Return type:

ndarray, tuple, ndarray, floats

class qp.pdf_gen.rv_frozen_rows(dist, shape, *args, **kwds)[source]

Trivial extention of scipy.stats.rv_frozen that to use when we want to have a collection of distribution of objects such as histograms or splines, where each object represents a single distribtuion

property ndim

Return the number of dimensions of PDFs in this ensemble

property shape

Return the shape of the set of PDFs this object represents

property npdf

Return the number of PDFs this object represents

histogramize(bins)[source]

Computes integrated histogram bin values for all PDFs

Parameters:

bins (ndarray, float, optional) – Array of N+1 endpoints of N bins

Returns:

self.histogram – Array of pairs of arrays of lengths (N+1, N) containing endpoints of bins and values in bins

Return type:

ndarray, tuple, ndarray, floats

class qp.pdf_gen.Pdf_rows_gen(*args, **kwargs)[source]

Class extend scipy.stats.rv_continuous with information needed for qp when we want to have a collection of distribution of objects such as histograms or splines, where each object represents a single distribtuion

property shape

Return the shape of the set of PDFs this object represents

property npdf

Return the number of PDFs this object represents

freeze(*args, **kwds)[source]

Freeze the distribution for the given arguments.

Parameters:
  • arg1 (array_like) – The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include loc and scale.

  • arg2 (array_like) – The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include loc and scale.

  • arg3 (array_like) – The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include loc and scale.

  • ... (array_like) – The shape parameter(s) for the distribution. Should include all the non-optional arguments, may include loc and scale.

Returns:

rv_frozen – The frozen distribution.

Return type:

rv_frozen instance

classmethod create_gen(**kwds)[source]

Create and return a scipy.stats.rv_continuous object using the keyword arguemntets provided

moment(n, *args, **kwds)[source]

Returns the moments request moments for all the PDFs.

This used to call a hacked version Pdf_gen._moment_fix which can handle cases of multiple PDFs. Now it prints a deprication warning for scipy < 1.8

Parameters:

n (int) – Order of the moment

Returns:

moments – The requested moments

Return type:

array_like

class qp.pdf_gen.Pdf_gen_wrap(*args, **kwargs)[source]

Mixin class to extend scipy.stats.rv_continuous with information needed for qp for analytic distributions.

classmethod get_allocation_kwds(npdf, **kwargs)[source]

Return kwds necessary to create ‘empty’ hdf5 file with npdf entries for iterative writeout

classmethod add_mappings()[source]

Add this classes mappings to the conversion dictionary

qp.dict_utils tools for multi-level dictionary manipulation

This module implements tools to convert between distributions

qp.dict_utils.get_val_or_default(in_dict, key)[source]

Helper functions to return either an item in a dictionary or the default value of the dictionary

Parameters:
  • in_dict (dict) – input dictionary

  • key (str) – key to search for

Returns:

out – The requested item

Return type:

dict or function

Notes

This will first try to return:

in_dict[key] : i.e., the requested item.

If that fails it will try

in_dict[None] : i.e., the default for that dictionary.

If that fails it will return

None

qp.dict_utils.set_val_or_default(in_dict, key, val)[source]

Helper functions to either get and item from or add an item to a dictionary and return that item

Parameters:
  • in_dict (dict) – input dictionary

  • key (str) – key to search for

  • val (dict or function) – item to add to the dictionary

Returns:

out – The requested item

Return type:

dict or function

Notes

This will first try to return:

in_dict[key] : i.e., the requested item.

If that fails it will try

in_dict[None] : i.e., the default for that dictionary.

If that fails it will return

None

qp.dict_utils.pretty_print(in_dict, prefixes, idx=0, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Print a level of the converstion dictionary in a human-readable format

Parameters:
  • in_dict (dict) – input dictionary

  • prefixs (list) – The prefixs to use at each level of the printing

  • idx (int) – The level of the input dictionary we are currently printing

  • stream (stream) – The stream to print to

qp.dict_utils.print_dict_shape(in_dict)[source]

Print the shape of arrays in a dictionary. This is useful for debugging table creation.

Parameters:

in_dict (dict) – The dictionary to print

qp.dict_utils.slice_dict(in_dict, subslice)[source]

Create a new dict by taking a slice of of every array in a dict

Parameters:
  • in_dict (dict) – The dictionary to conver

  • subslice (int or slice) – Used to slice the arrays

Returns:

out_dict – The converted dicionary

Return type:

dict

qp.dict_utils.check_keys(in_dicts)[source]

Check that the keys in all the in_dicts match

Raises KeyError if one does not match.

qp.dict_utils.concatenate_dicts(in_dicts)[source]

Create a new dict by concatenate each array in in_dicts

Parameters:

in_dicts (list) – The dictionaries to stack

Returns:

out_dict – The stacked dicionary

Return type:

dict

qp.dict_utils.check_array_shapes(in_dict, npdf)[source]

Check that all the arrays in in_dict match the number of pdfs

Raises ValueError if one does not match.

qp.dict_utils.compare_two_dicts(d1, d2)[source]

Check that all the items in d1 and d2 match

Returns:

match – True if they all match, False otherwise

Return type:

bool

qp.dict_utils.compare_dicts(in_dicts)[source]

Check that all the dicts in in_dicts match

Returns:

match – True if they all match, False otherwise

Return type:

bool

qp.plotting: Tools for PDF plotting

Functions to plot PDFs

qp.plotting.init_matplotlib()[source]

Initialize matplotlib parameters

qp.plotting.make_figure_axes(xlim, **kwargs)[source]

Build a figure and a set of figure axes to plot data on

Parameters:
  • xlim ((float, float)) – The x-axis limits of the plot

  • **kwargs – passed directly to the matplotlib plot function

Returns:

fig, axes

Return type:

The figure and axes

qp.plotting.get_axes_and_xlims(**kwargs)[source]

Get and return the axes and xlims from the kwargs

qp.plotting.plot_pdf_on_axes(axes, pdf, xvals, **kwargs)[source]

Plot a PDF on a set of axes, by evaluating it a set of points

Parameters:
  • axes (matplotlib.axes or None) – The axes we want to plot the data on

  • pdf (scipy.stats.rv_frozen) – The distribution we want to plot

  • xvals (np.array) – The locations we evaluate the PDF at for plotting

  • **kwargs – Keywords are passed to matplotlib

Returns:

axes

Return type:

The axes the data are plotted on

qp.plotting.plot_dist_pdf(pdf, **kwargs)[source]

Plot a PDF on a set of axes, using the axes limits

Parameters:
  • pdf (scipy.stats.rv_frozen) – The distribution we want to plot

  • axes (matplotlib.axes) – The axes to plot on

  • xlim ((float, float)) – The x-axis limits

  • npts (int) – The number of x-axis points

  • kwargs (remaining) – passed directly to the plot_pdf_on_axes plot function

Returns:

axes

Return type:

The axes the data are plotted on

qp.plotting.plot_pdf_quantiles_on_axes(axes, xvals, yvals, quantiles, **kwargs)[source]

Plot a PDF on a set of axes, by evaluating at the quantiles provided

Parameters:
  • axes (The axes we want to plot the data on) –

  • xvals (array_like) – Pdf xvalues

  • yvals (array_like) – Pdf yvalues

  • quantiles ((np.array, np.array)) – The quantiles that define the distribution pdf

  • **kwargs – passed directly to the matplotlib plot function

  • npoints (int) – Number of points to use in the plotting. Evenly spaced along the axis provided.

Returns:

axes

Return type:

The axes the data are plotted on

qp.plotting.plot_pdf_histogram_on_axes(axes, hist, **kwargs)[source]

Plot a PDF on a set of axes, by plotting the histogrammed data

Parameters:
  • axes – The axes we want to plot the data on

  • **kwargs – passed directly to the matplotlib plot function

  • npoints (int) – Number of points to use in the plotting. Evenly spaced along the axis provided.

Returns:

The axes the data are plotted on

Return type:

axes

qp.plotting.plot_pdf_samples_on_axes(axes, pdf, samples, **kwargs)[source]

Plot a PDF on a set of axes, by displaying a set of samples from the PDF

Parameters:
  • axes (The axes we want to plot the data on) –

  • pdf (scipy.stats.rv_frozen) – The distribution we want to plot

  • samples (np.array) – Points sampled from the PDF

  • **kwargs – passed directly to the matplotlib plot function

Returns:

axes

Return type:

The axes the data are plotted on

qp.plotting.plot_native(pdf, **kwargs)[source]

Utility function to plot a pdf in a format that is specific to that type of pdf

qp.plotting.plot(pdf, **kwargs)[source]

Utility function to plot a pdf in a format that is specific to that type of pdf