Parameterization types
Histogram based
- class qp.hist_gen(bins: ArrayLike, pdfs: ArrayLike, norm: bool = True, warn: bool = True, *args, **kwargs)[source]
Bases:
Pdf_rows_genImplements distributions parameterized as histograms.
By default, the input distribution is normalized. If the input data is already normalized, you can use the optional parameter
norm = Falseto skip the normalization process.- Parameters:
- bins
ArrayLike The array containing the (n+1) bin boundaries
- pdfs
ArrayLike The array containing the (npdf, n) bin values
- normbool, optional
If True, normalizes the input distribution. If False, assumes the given distribution is already normalized. By default True.
- warnbool, optional
If True, raises warnings if input is not valid PDF data (i.e. if data is negative). If False, no warnings are raised. By default True.
- bins
Notes
There must be a minimum of 2 bins.
Converting to this parameterization:
This table contains the available methods to convert to this parameterization, their required arguments, and their method keys. If the key is
None, this is the default conversion method.Function
Arguments
Method key
bins (array of bin edges)
None
bins (array of bin edges), size (int, optional, number of samples to generate)
samples
Implementation notes:
Inside a given bin
pdf()will return thehist_gen.pdfsvalue. Outside the range of the given binspdf()will return 0.Inside a given bin
cdf()will use a linear interpolation across the bin. Outside the range of the given binscdf()will return (0 or 1), respectively.The percentage point function
ppf()will return negative infinity at 0 and positive infinity at 1.- name = 'hist'
- version = 0
- normalize() Mapping[str, ndarray[float]][source]
Normalizes the input distribution values.
- Returns:
Mapping[str,np.ndarray[float]]An (npdf, n) array of pdf values in the n bins for the npdf distributions
- Raises:
ValueErrorRaised if the sum under the distribution <= 0.
- classmethod get_allocation_kwds(npdf: int, **kwargs: str) dict[str, tuple[tuple[int, int], str]][source]
Return the kwds necessary to create an
emptyHDF5 file withnpdfentries for iterative write. We only need to allocate the data columns, as the metadata will be written when we finalize the file.The number of data columns is calculated based on the length or shape of the metadata,
n. For example, the number of columns isnbins-1for a histogram.- Parameters:
- npdf
int Total number of distributions that will be written out
- kwargs
The keys needed to construct the shape of the data to be written.
- npdf
- Returns:
- Raises:
ValueErrorRaises an error if the bins is not provided.
- classmethod plot_native(pdf: Ensemble, **kwargs)[source]
Plot the PDF in a way that is particular to this type of distribution
For a histogram this shows the bin edges.
- classmethod create_ensemble(bins: ArrayLike, pdfs: ArrayLike, norm: bool = True, warn: bool = True, ancil: Mapping | None = None) Ensemble[source]
Creates an Ensemble of distributions parameterized as histograms.
- Parameters:
- bins
ArrayLike The array containing the (n+1) bin boundaries
- pdfs
ArrayLike The array containing the (npdf, n) bin values
- normbool, optional
If True, normalizes the input distribution. If False, assumes the given distribution is already normalized. By default True.
- warnbool, optional
If True, raises warnings if input is not valid PDF data (i.e. if data is negative). If False, no warnings are raised. By default True.
- ancil
Optional[Mapping], optional A dictionary of metadata for the distributions, where any arrays have length npdf, by default None
- bins
- Returns:
EnsembleAn Ensemble object containing all of the given distributions.
Examples
To create an Ensemble with two distributions and an ‘ancil’ table that provides ids for the distributions, you can use the following code:
>>> import qp >>> import numpy as np >>> bins= [0,1,2,3,4,5] >>> pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]]) >>> ancil = {'ids': [105, 108]} >>> ens = qp.hist.create_ensemble(bins,pdfs,ancil=ancil) >>> ens.metadata {'pdf_name': array([b'hist'], dtype='|S4'), 'pdf_version': array([0]), 'bins': array([[0, 1, 2, 3, 4, 5]])}
Utility functions
- qp.parameterizations.hist.hist_utils.evaluate_hist_x_multi_y(x: ArrayLike, row: ArrayLike, bins: ArrayLike, vals: ArrayLike, derivs=None) ndarray[float][source]
Evaluate a set of values from histograms
- Parameters:
- Returns:
- out
np.ndarray[float] The histogram values
- out
Notes
Depending on the shape of ‘x’ and ‘row’ this will use one of the three specific implementations.
- qp.parameterizations.hist.hist_utils.evaluate_hist_x_multi_y_product(x: ArrayLike, row: ArrayLike, bins: ArrayLike, vals: ArrayLike, derivs=None) ndarray[float][source]
Evaluate a set of values from histograms
- qp.parameterizations.hist.hist_utils.evaluate_hist_x_multi_y_2d(x: ArrayLike, row: ArrayLike, bins: ArrayLike, vals: ArrayLike, derivs=None) ndarray[float][source]
Evaluate a set of values from histograms
- qp.parameterizations.hist.hist_utils.evaluate_hist_x_multi_y_flat(x: ArrayLike, row: ArrayLike, bins: ArrayLike, vals: ArrayLike, derivs=None) ndarray[float][source]
Evaluate a set of values from histograms
- qp.parameterizations.hist.hist_utils.extract_hist_values(in_dist: Ensemble, **kwargs) dict[str, Any][source]
Convert to a histogram by using the CDF values at the given bin edges to calculate the value within each bin.
Interpolation of a fixed grid
- class qp.interp_gen(xvals: ArrayLike, yvals: ArrayLike, norm: bool = True, warn: bool = True, *args, **kwargs)[source]
Bases:
Pdf_rows_genImplements distributions parameterized as interpolated sets of values.
All distributions share the same x values. Interpolation is performed using
scipy.interpolate.interp1d, with the default interpolation method (linear).- Parameters:
- xvals
ArrayLike The n x-values that are used by all the distributions
- yvals
ArrayLike The y-values that represent each distribution, with shape (npdf,n)
- normbool, optional
If
True, normalizes the input distribution. IfFalse, assumes the given distribution is already normalized. By defaultTrue.- warnbool, optional
If
True, raises warnings if input is not valid PDF data (i.e. if data is negative). IfFalse, no warnings are raised. By defaultTrue.
- xvals
Notes
Converting to this parameterization:
This table contains the available methods to convert to this parameterization, their required arguments, and their method keys. If the key is
None, this is the default conversion method.Function
Arguments
Method key
xvals
None
Implementation notes:
This uses the same xvals for all the the PDFs, unlike
interp_irregular_genwhich has a different set of xvals for each distribution.interp_gentherefore allows for much faster evaluation thaninterp_irregular_gen, and reduces the memory usage by a factor of 2.Inside the range of given xvals it takes a set of x and y values and uses
scipy.interpolate.interp1dto build the PDF. Outside the range of given xvals thepdf()will return 0.The
cdf()is constructed by integrating analytically – computing the cumulative sum at the given xvals and interpolating between them. This will give a slight discrepancy with the true integral of thepdf(), but is much, much faster to evaluate. Outside the range of given xvals thecdf()will return 0 or 1, respectivelyThe
ppf()is computed by inverting thecdf().ppf(0)will return negative infinity andppf(1)will return positive infinity.- name = 'interp'
- version = 0
- normalize() Mapping[str, ndarray[float]][source]
Normalizes the input distribution values.
- Returns:
Mapping[str,np.ndarray[float]]An (npdf, n) array of y values for the npdf distributions
- Raises:
ValueErrorRaised if the sum under the distribution <= 0.
- classmethod get_allocation_kwds(npdf: int, **kwargs) dict[str, tuple[tuple[int, int], str]][source]
Return the kwds necessary to create an
emptyHDF5 file withnpdfentries for iterative write. We only need to allocate the data columns, as the metadata will be written when we finalize the file.The number of data columns is calculated based on the length or shape of the metadata,
n. For example, the number of columns isnbins-1for a histogram.- Parameters:
- npdf
int Total number of distributions that will be written out
- kwargs
The keys needed to construct the shape of the data to be written.
- npdf
- Returns:
- Raises:
ValueErrorRaises an error if xvals is not provided.
- classmethod plot_native(pdf, **kwargs)[source]
Plot the PDF in a way that is particular to this type of distribution
For a interpolated PDF this uses the interpolation points.
- classmethod create_ensemble(xvals: ArrayLike, yvals: ArrayLike, norm: bool = True, warn: bool = True, ancil: Mapping | None = None) Ensemble[source]
Creates an Ensemble of distributions parameterized as interpolations.
- Parameters:
- xvals
ArrayLike The x-values used to do the interpolation, shape is n
- yvals
ArrayLike The y-values used to do the interpolation, shape is (npdfs, n), where npdfs is the number of distributions
- normbool, optional
If True, normalizes the input distribution. If False, assumes the given distribution is already normalized. By default True.
- warnbool, optional
If True, raises warnings if input is not valid PDF data (i.e. if data is negative). If False, no warnings are raised. By default True.
- ancil
Optional[Mapping] A dictionary of metadata for the distributions, where any arrays have the same length as the number of distributions
- xvals
- Returns:
EnsembleAn Ensemble object containing all of the given distributions.
Examples
To create an ensemble with two distributions and their associated ids:
>>> import qp >>> import numpy as np >>> xvals= np.array([0,0.5,1,1.5,2]), >>> yvals = np.array([[0.01, 0.2,0.3,0.2,0.01],[0.09,0.25,0.2,0.1,0.01]]) >>> ancil = {'ids':[5,8]} >>> ens = qp.interp.create_ensemble(xvals, yvals,ancil=ancil) >>> ens.metadata {'pdf_name': array([b'interp'], dtype='|S6'), 'pdf_version': array([0]), 'xvals': array([[0. , 0.5, 1. , 1.5, 2. ]])}
Interpolation of a non-fixed grid
- class qp.interp_irregular_gen(xvals: ArrayLike, yvals: ArrayLike, norm: bool = True, warn: bool = True, *args, **kwargs)[source]
Bases:
Pdf_rows_genImplements distributions parameterized as interpolated sets of values.
Each distribution has its own set of x values. Interpolation is performed using
scipy.interpolate.interp1d, with the default interpolation method (linear).- Parameters:
- xvals
ArrayLike The x-values that are used by each distribution, with shape (npdf,n)
- yvals
ArrayLike The y-values that represent each distribution, with shape (npdf,n)
- normbool, optional
If True, normalizes the input distribution. If False, assumes the given distribution is already normalized. By default True.
- warnbool, optional
If True, raises warnings if input is not valid PDF data (i.e. if data is negative). If False, no warnings are raised. By default True.
- xvals
Notes
Converting to this parameterization:
This table contains the available methods to convert to this parameterization, their required arguments, and their method keys. If the key is
None, this is the default conversion method.Function
Arguments
Method key
xvals
None
Implementation notes:
Inside the range xvals[:,0], xvals[:,-1] it simply takes a set of x and y values and uses
scipy.interpolate.interp1dto linearly interpolate the PDF. Outside the range xvals[:,0], xvals[:,-1] thepdf()will return 0.The cdf() is constructed by analytically computing the cumulative sum at the xvals grid points and linearly interpolating between them. This will give a slight discrepancy with the true integral of the
pdf(), but is much, much faster to evaluate. Outside the range xvals[:,0], xvals[:,-1] thecdf()will return 0 or 1, respectivelyThe
ppf()is computed by inverting thecdf().ppf(0)gives negative infinity, andppf(1)gives positive infinity.- name = 'interp_irregular'
- version = 0
- normalize() Mapping[str, ndarray[float]][source]
Normalize a set of 1D interpolators
- Returns:
- ynorm
Mapping[str,np.ndarray[float]] Normalized y-vals
- ynorm
- classmethod get_allocation_kwds(npdf: int, **kwargs) dict[str, tuple[tuple[int, int], str]][source]
Return the kwds necessary to create an
emptyHDF5 file withnpdfentries for iterative write. We only need to allocate the data columns, as the metadata will be written when we finalize the file.The number of data columns is calculated based on the length or shape of the metadata,
n. For example, the number of columns isnbins-1for a histogram.- Parameters:
- npdf
int Total number of distributions that will be written out
- kwargs
The keys needed to construct the shape of the data to be written.
- npdf
- Returns:
- Raises:
ValueErrorRaises an error if xvals is not provided.
- classmethod plot_native(pdf, **kwargs)[source]
Plot the PDF in a way that is particular to this type of distribution
For a interpolated PDF this uses the interpolation points.
- classmethod create_ensemble(xvals: ArrayLike, yvals: ArrayLike, norm: bool = True, warn: bool = True, ancil: Mapping | None = None) Ensemble[source]
Creates an Ensemble of distributions parameterized as interpolations.
- Parameters:
- xvals
ArrayLike The x-values for each distribution, with shape (npdf, n), where n is the number of x-values
- yvals
ArrayLike The y-values that represent each distribution, with shape (npdf,n)
- normbool, optional
If True, normalizes the input distribution. If False, assumes the given distribution is already normalized. By default True.
- warnbool, optional
If True, raises warnings if input is not valid PDF data (i.e. if data is negative). If False, no warnings are raised. By default True.
- ancil
Optional[Mapping] A dictionary of metadata for the distributions, where any arrays have the same length as the number of distributions.
- xvals
- Returns:
EnsembleAn Ensemble object containing all of the given distributions.
Examples
To create an Ensemble with two distributions and their associated ids:
>>> import qp >>> import numpy as np >>> xvals = np.array([[0,0.5,1,1.5,2],[0.5,0.75,1,1.25,1.5]]), >>> yvals = np.array([[0.01, 0.2,0.3,0.2,0.01],[0.09,0.25,0.2,0.1,0.01]])} >>> ancil = {'ids':[5,8]} >>> ens = qp.interp_irregular.create_ensemble(xvals, yvals,ancil) >>> ens.metadata {'pdf_name': array([b'interp_irregular'], dtype='|S16'), 'pdf_version': array([0])}
Utility functions
- qp.parameterizations.interp.interp_utils.irreg_interp_extract_xy_vals(in_dist: Ensemble, **kwargs)[source]
Wrapper for extract_xy_vals. Convert using a set of x and y values.
- Parameters:
- in_dist
Ensemble Input distributions
- xvals
np.ndarray[float] Locations at which the pdf is evaluated
- in_dist
- Returns:
- data
dict The extracted data
- data
- qp.parameterizations.interp.interp_utils.extract_vals_at_x(in_dist: Ensemble, **kwargs) dict[str, np.ndarray[float]][source]
Convert using a set of x and y values.
- Parameters:
- in_dist
Ensemble Input distributions
- in_dist
- Returns:
- data
dict[str,np.ndarray[float]] The extracted data
- data
- Other Parameters:
- xvals
np.ndarray[float] Locations at which the pdf is evaluated
- xvals
- qp.parameterizations.interp.interp_utils.extract_xy_sparse(in_dist: Ensemble, **kwargs) dict[str, Any][source]
Extract xy-interpolated representation from an sparse representation
- Parameters:
- in_dist
Ensemble Input distributions
- in_dist
- Returns:
- Other Parameters:
Notes
This function will rebin to a grid more suited to the in_dist support by removing x-values corresponding to y=0
Quantile based
- class qp.quant_gen(quants: ArrayLike, locs: ArrayLike, pdf_constructor_name: str = 'piecewise_linear', ensure_extent: bool = True, warn: bool = True, *args, **kwargs)[source]
Bases:
Pdf_rows_genQuantile based distribution, where the PDF is defined from the quantiles.
- Parameters:
- quants
ArrayLike The quantiles of the CDF, of shape n
- locs
ArrayLike The locations at which those quantiles are reached, of shape (npdf, n)
- pdf_constructor_name
str, optional The constructor or interpolator to use to create the PDF, by default “piecewise_linear”.
- ensure_extentbool, optional
If True, will ensure that the quants start at 0 and end at 1 by adding data points at both ends until this is true. locs are extrapolated linearly from input data. By default True.
- warnbool, optional
If True, raises warnings if input is not valid data (i.e. if data is not finite). If False, no warnings are raised. By default True.
- quants
Notes
Converting to this parameterization:
This table contains the available methods to convert to this parameterization, their required arguments, and their method keys. If the key is
None, this is the default conversion method.Function
Arguments
Method key
quants
None
Implementation notes:
This implements a CDF by interpolating a set of quantile values
It takes a set of quants and locs values and uses
scipy.interpolate.interp1dwith a spline interpolation method of order 2 (kind=`quadratic`) to build the CDF.It has multiple PDF constructors to get the PDF from the quantiles. The default is the
piecewise_linearmethod, which takes the numerical derivative of the CDF and interpolates between those points.ppf(0)returns negative infinity andppf(1)returns positive infinity.- name = 'quant'
- version = 0
- property pdf_constructor_name: str
Returns the name of the current pdf constructor. Matches a key in the
PDF_CONSTRUCTORSdictionary.
- property pdf_constructor: AbstractQuantilePdfConstructor
Returns the current PDF constructor, and allows the user to interact with its methods.
- Returns:
AbstractQuantilePdfConstructorAbstract base class of the active concrete PDF constructor.
- classmethod get_allocation_kwds(npdf, **kwargs) dict[str, tuple[tuple[int, int], str]][source]
Return the kwds necessary to create an
emptyHDF5 file withnpdfentries for iterative write. We only need to allocate the data columns, as the metadata will be written when we finalize the file.The number of data columns is calculated based on the length or shape of the metadata,
n. For example, the number of columns isnbins-1for a histogram.- Parameters:
- npdf
int Total number of distributions that will be written out
- kwargs
The keys needed to construct the shape of the data to be written.
- npdf
- Returns:
- Raises:
ValueErrorRaises an error if the required kwarg quants is not provided.
- classmethod plot_native(pdf, **kwargs)[source]
Plot the PDF in a way that is particular to this type of distribution
For a quantile this shows the quantiles points.
- classmethod create_ensemble(quants: ArrayLike, locs: ArrayLike, pdf_constructor_name: str = 'piecewise_linear', ensure_extent: bool = True, warn: bool = True, ancil: Mapping | None = None) Ensemble[source]
Creates an Ensemble of distributions parameterized as quantiles.
The options for pdf_constructor_name are:
piecewise_linear,piecewise_constant,dual_spline_averageand ‘cdf_spline_derivative`.- Parameters:
- quants
ArrayLike The quantiles used to build the CDF, shape n
- locs
ArrayLike The locations at which those quantiles are reached, shape (npdfs, n), where npdfs is the number of distributions.
- pdf_constructor_name
str, optional The constructor to use to create the PDF, by default “piecewise_linear”.
- ensure_extentbool, optional
If True, will ensure that the quants start at 0 and end at 1 by adding data points at both ends until this is true. locs are extrapolated linearly from input data. By default True.
- warnbool, optional
If True, raises warnings if input is not valid (i.e. if locs are not finite values). If False, no warnings are raised. By default True.
- ancil
Optional[Mapping], optional A dictionary of metadata for the distributions, where any arrays have the same length as the number of distributions, by default None
- quants
- Returns:
EnsembleAn Ensemble object containing all of the given distributions.
Examples
To create an Ensemble with two distributions and associated ids, using the
dual_spline_averageconstructor:>>> import qp >>> import numpy as np >>> quants = np.array([0.0001,0.25,0.5,0.75,0.9999]) >>> locs = np.array([[0.0001,0.1,0.3,0.5,0.75],[0.01,0.05,0.15,0.3,0.5]]) >>> pdf_constructor_name = 'dual_spline_average' >>> ancil = {'ids':[11,18]} >>> ens = qp.quant.create_ensemble(quants,locs,pdf_constructor_name,ancil=ancil) >>> ens.metadata {'pdf_name': array([b'quant'], dtype='|S5'), 'pdf_version': array([0]), 'quants': array([[0.000e+00, 1.000e-04, 2.500e-01, 5.000e-01, 7.500e-01, 9.999e-01, 1.000e+00]]), 'pdf_constructor_name': array(['dual_spline_average'], dtype='|S19'), 'check_input': array([ True])}
Utility functions
- qp.parameterizations.quant.quant_utils.extract_quantiles(in_dist: Ensemble, **kwargs) dict[str, np.ndarray[float]][source]
Convert using a set of quantiles and the locations at which they are reached
- Parameters:
- in_dist
Ensemble Input distributions
- in_dist
- Returns:
- Other Parameters:
- quants
np.ndarray Quantile values to use
- quants
- qp.parameterizations.quant.quant_utils.pad_quantiles(quants: ArrayLike, locs: ArrayLike) tuple[ndarray[float], ndarray[float]][source]
Pad the quantiles and locations used to build a quantile representation. Ensuring 0 and 1 are part of quantiles. Extrapolates loc at 0 by taking a linear extrapolation from the first two points and following to where it intersects 0 Extrapolates loc at 1 by taking a linear extrapolation from the last two points and following to where it intersects 1
This will add additional data points to the quants and locs
- Parameters:
- Returns:
- quants
np.ndarray[float] The quantiles used to build the CDF
- locs
np.ndarray[float] The locations at which those quantiles are reached
- quants
- qp.parameterizations.quant.quant_utils.evaluate_hist_multi_x_multi_y(x: ArrayLike, row: ArrayLike, bins: ArrayLike, vals: ArrayLike, derivs=None) ndarray[float][source]
Evaluate a set of values from histograms
- qp.parameterizations.quant.quant_utils.evaluate_hist_multi_x_multi_y_flat(x: ArrayLike, row: ArrayLike, bins: ArrayLike, vals: ArrayLike, derivs=None) ndarray[float][source]
Evaluate a set of values from histograms
- qp.parameterizations.quant.quant_utils.evaluate_hist_multi_x_multi_y_product(x: ArrayLike, row: ArrayLike, bins: ArrayLike, vals: ArrayLike, derivs=None) ndarray[float][source]
Evaluate a set of values from histograms
- qp.parameterizations.quant.quant_utils.evaluate_hist_multi_x_multi_y_2d(x: ArrayLike, row: ArrayLike, bins: ArrayLike, vals: ArrayLike, derivs=None) ndarray[float][source]
Evaluate a set of values from histograms
- class qp.parameterizations.quant.abstract_pdf_constructor.AbstractQuantilePdfConstructor(quantiles: List[float], locations: List[List[float]])[source]
Bases:
objectAbstract class to define an interface for concrete PDF Constructor classes
- prepare_constructor() None[source]
All the intermediate math for a constructor should happen here. This is public so that the user can trigger a recalculation of the of variables needed to construct the original PDF. This method should either return functions or set variables that will receive x values and return y values.
- construct_pdf(grid: List[float], row: List[int] | None = None) List[List[float]][source]
This is the method that the user would most often be interacting with by passing a grid (set of x values) and optionally a list of indexes for for filtering.
This is also the method that is called by
quant_gen._pdf.- Parameters:
- Returns:
- class qp.parameterizations.quant.cdf_spline_derivative.CdfSplineDerivative(quantiles: List[float], locations: List[List[float]])[source]
Bases:
AbstractQuantilePdfConstructorImplements an interpolation algorithm based on a list of quantiles and locations. First we fit a spline to the (quantile,location) pairs. Then evaluate the derivative of the spline. This represents a reconstruction of the original PDF from which the quantiles and locations were selected.
Calling
cdf_spline.interpolate(grid)will evaluate the spline derivatives at the provided grid values.- prepare_constructor(spline_order: int = 3) None[source]
Calculate the fit spline derivative for each of the original distributions This function is the least performant - for reference, on a M1 Mac, it requires about 30 seconds to produce an output given shape(locations) = (1_000_000, 30).
Note: we are aware that the edges of the resulting pdf are showing an elephant foot.
- Parameters:
- spline_order
int Defines the order of the spline fit, defaults to 3
- spline_order
- construct_pdf(grid: List[float], row: List[int] | None = None) List[List[float]][source]
Evaluate the fit spline derivative at each of the grid values
- debug()[source]
This is a debugging utility that is meant to return intermediate calculation values to make it easier to visualize and debug the reconstruction algorithm.
- Returns:
_quantilesInput during constructor instantiation
_locationsInput during constructor instantiation
_interpolation_functionsThe list of analytic derivatives of splines fit to the input data
- class qp.parameterizations.quant.dual_spline_average.DualSplineAverage(quantiles: List[float], locations: List[List[float]])[source]
Bases:
AbstractQuantilePdfConstructorImplementation of the “area-under-the-curve” using the average of the bounding splines fit to the CDF derivative.
By using the difference between quantiles to solve for the area under the PDF, we can create an approximation of the original PDF. However, because we use a piecewise linear approximation for the continuous PDF, our approximated p(z) values will always be different that the original distribution. In practice they typically oscillate above and below the original curve as each calculation attempts to correct for over or undershooting of the prior calculation.
If we fit two splines, one to the odd and one to the even approximated points, then take the average, the resulting average of those splines tend to fit the original distribution well.
This constructor implements that algorithmic approach.
- prepare_constructor() None[source]
This method solves for the area under the PDF via a stepwise algorithm. Given that the difference between any two quantile values is equal to the area under the PDF between the corresponding pair of locations, _and_ given that we know the p(z) value at 1 of those locations, we can solve for the unknown p(z) value at the other location.
We approximate the area under the curve as a trapezoid with the following area: (q_i+1 - q_i) = (loc_i+1 - loc_i) * p(z_i) + (1/2) * (loc_i+1 - loc_i) * p(z_i+1 = p(z_i))
Solving for p(z_i+1), we have: p(z_i+1) = [2 * (q_i+1 - q_i) / (loc_i+1 - loc_i)] - p(z_i)
The first term in this equation is calculated as
first_term. After that we step along all distributions simultaneously for each location, using the previous p(z) value to calculate the next.
- construct_pdf(grid: List[float], row: List[int] | None = None) List[List[float]][source]
This method utilizes intermediate calculations from
prepare_constructoralong with the provided grid (i.e. x) values to return corresponding y values to construct the PDF approximation.- Parameters:
- Returns:
- debug()[source]
This is a debugging utility that is meant to return intermediate calculation values to make it easier to visualize and debug the reconstruction algorithm.
- Returns:
_quantilesInput during constructor instantiation
_locationsInput during constructor instantiation
_p_of_zsResulting p(z) values found after calculating the area of trapezoids based on the difference between adjacent quantile values
y1One of two splines fit to alternating pairs of (_locations, _p_of_zs)
y2One of two splines fit to alternating pairs of (_locations, _p_of_zs)
- class qp.parameterizations.quant.piecewise_constant.PiecewiseConstant(quantiles: List[float], locations: List[List[float]])[source]
Bases:
AbstractQuantilePdfConstructorThis constructor takes the input quantiles and locations, and calculates a numerical derivative. We assume a constant value between derivative points and interpolate between those.
- prepare_constructor() None[source]
This method will calculate the numerical derivative as well as the adjusted locations. The adjustments are necessary because the derivative is not a central derivative.
- construct_pdf(grid: List[float], row: List[int] | None = None) List[List[float]][source]
Take the intermediate calculations and return the interpolated y values given the input grid.
- debug()[source]
Utility method to help with debugging. Returns input and intermediate calculations.
- Returns:
_quantilesInput during constructor instantiation
_locationsInput during constructor instantiation
_cdf_derivativesNumerical derivative using _quantiles and _locations
_cdf_2nd_derivativesNumerical second derivative using _quantiles and _locations
_adjusted_locationsResult of shifting the locations due to the use of non-central numerical derivatives
- class qp.parameterizations.quant.piecewise_linear.PiecewiseLinear(quantiles: List[float], locations: List[List[float]])[source]
Bases:
AbstractQuantilePdfConstructorThis constructor takes the input quantiles and locations, and calculates a numerical derivative. The resulting values are passed to scipy’s
interp1dwhich will perform a linear interpolation given a set of x values.- prepare_constructor() None[source]
This method will calculate the numerical derivative as well as the adjusted locations. The adjustments are necessary because the derivative is not a central derivative.
- construct_pdf(grid: List[float], row: List[int] | None = None) List[List[float]][source]
Take the intermediate calculations and return the interpolated y values given the input grid.
- debug()[source]
Utility method to help with debugging. Returns input and intermediate calculations.
- Returns:
_quantilesInput during constructor instantiation
_locationsInput during constructor instantiation
_cdf_derivativesNumerical derivative using _quantiles and _locations
_adjusted_locationsResult of shifting the locations due to the use of non-central numerical derivatives
Gaussian mixture model based
- class qp.mixmod_gen(means: ArrayLike, stds: ArrayLike, weights: ArrayLike, warn: bool = True, *args, **kwargs)[source]
Bases:
Pdf_rows_genParameterizes distributions using a Gaussian Mixture model.
There are
ncompGaussians in the model, andnpdfdistributions contained in the object.- Parameters:
- means
ArrayLike The means of the Gaussians, with shape (npdf, ncomp)
- stds
ArrayLike The standard deviations of the Gaussians, with shape (npdf, ncomp)
- weights
ArrayLike The weights to attach to the Gaussians, with shape (npdf, ncomp). Weights should sum up to one. If not, the weights are interpreted as relative weights.
- warnbool, optional
If True, raises warnings if input is not finite. If False, no warnings are raised. By default True.
- means
Notes
All distributions must have the same number of Gaussian components,
ncomp. Use 0 as a fill value instead ofNan, which will result in errors in the PDF construction.Converting to this parameterization:
This table contains the available methods to convert to this parameterization, their required arguments, and their method keys. If the key is
None, this is the default conversion method.Function
Arguments
Method key
ncomps=3, nsamples=1000, random_state=None
None
Implementation Notes:
The
pdf()andcdf()are exact, and are computed as a weighted sum of thepdf()andcdf()of the component Gaussians.The
ppf()is computed by computing thecdf()values on a fixed grid and interpolating the inverse function usingscipy.interp1dwith the default interpolation method (linear).ppf(0)returns negative infinity andppf(1)returns positive infinity.- name = 'mixmod'
- version = 0
- classmethod get_allocation_kwds(npdf, **kwargs) dict[str, tuple[tuple[int, int], str]][source]
Return the kwds necessary to create an
emptyHDF5 file withnpdfentries for iterative write. We only need to allocate the data columns, as the metadata will be written when we finalize the file.The number of data columns is calculated based on the length or shape of the metadata,
n. For example, the number of columns isnbins-1for a histogram.- Parameters:
- npdf
int Total number of distributions that will be written out
- kwargs
The keys needed to construct the shape of the data to be written.
- npdf
- Returns:
- Raises:
ValueErrorRaises an error if the means are not provided.
- classmethod create_ensemble(means: ArrayLike, stds: ArrayLike, weights: ArrayLike, warn: bool = True, ancil: Mapping | None = None) Ensemble[source]
Creates an Ensemble of distributions parameterized as Gaussian Mixture models.
npdf= the number of distributionsncomp= the number of Gaussians in the mixture model- Parameters:
- means
ArrayLike The means of the Gaussians, with shape (npdf, ncomp)
- stds
ArrayLike The standard deviations of the Gaussians, with shape (npdf, ncomp)
- weights
ArrayLike The weights to attach to the Gaussians, with shape (npdf, ncomp). Weights should sum up to one. If not, the weights are interpreted as relative weights.
- warnbool, optional
If True, raises warnings if input is not finite. If False, no warnings are raised. By default True.
- ancil
Optional[Mapping], optional A dictionary of metadata for the distributions, where any arrays have the same length as the number of distributions, by default None
- means
- Returns:
EnsembleAn Ensemble object containing all of the given distributions.
Examples
To create an Ensemble of two distributions with associated ids:
>>> import qp >>> means = np.array([[0.35, 0.55],[0.23,0.81]]) >>> stds = np.array([[0.2, 0.25],[0.21, 0.19]]) >>> weights = np.array([[0.4, 0.6],[0.3,0.7]])} >>> ancil = {'ids': [200, 205]} >>> ens = qp.mixmod.create_ensemble(means, stds, weights, ancil) >>> ens.metadata {'pdf_name': array([b'mixmod'], dtype='|S6'), 'pdf_version': array([0])}
Utility functions
Spline based
- class qp.spline_gen(*args, **kwargs)[source]
Bases:
Pdf_rows_genSpline based distribution
Notes
This implements PDFs using a set of splines
The relevant data members are:
splx: (npdf, n) spline-knot x-valuessply: (npdf, n) spline-knot y-valuesspln: (npdf) spline-knot order parameters
The pdf() for the ith pdf will return the result of
scipy.interpolate.splev(x, splx[i], sply[i], spln[i))The cdf() for the ith pdf will return the result of
scipy.interpolate.splint(x, splx[i], sply[i], spln[i))The ppf() will use the default scipy implementation, which inverts the cdf() as evaluated on an adaptive grid.
- name = 'spline'
- version = 0
- static build_normed_splines(xvals, yvals, **kwargs)[source]
Build a set of normalized splines using the x and y values
- classmethod create_from_xy_vals(xvals, yvals, **kwargs)[source]
Create a new distribution using the given x and y values
- Parameters:
- Returns:
- pdf_obj
spline_gen The requested PDF
- pdf_obj
- classmethod create_from_samples(xvals, samples, **kwargs)[source]
Create a new distribution using the given x and y values
- Parameters:
- Returns:
- pdf_obj
spline_gen The requested PDF
- pdf_obj
- ppf(quants)[source]
Percent point function (inverse of
cdf) at q of the given RV.- Parameters:
- qarray_like
lower tail probability
- arg1, arg2, arg3,…array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information)
- locarray_like, optional
location parameter (default=0)
- scalearray_like, optional
scale parameter (default=1)
- Returns:
- xarray_like
quantile corresponding to the lower tail probability q.
- classmethod get_allocation_kwds(npdf: int, **kwargs) dict[str, tuple[tuple[int, int], str]][source]
Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.
- classmethod plot_native(pdf, **kwargs)[source]
Plot the PDF in a way that is particular to this type of distibution
For a spline this shows the spline knots
- classmethod create_ensemble(splx: ArrayLike, sply: ArrayLike, spln: ArrayLike | None = None, ancil: Mapping | None = None, method: str | None = None) Ensemble[source]
Creates an Ensemble of distributions parameterized as via a set of splines.
- Parameters:
- splx
ArrayLike The x-values of the spline knots
- sply
ArrayLike The y-values of the spline knots
- spln
ArrayLike, optional The order of the spline knots, by default None
- ancil
Optional[Mapping], optional A dictionary of metadata for the distributions, where any arrays have the same length as the number of distributions, by default None
- method
Optional[str], optional The string of the creation method to use, by default None.
- splx
- Returns:
EnsembleAn Ensemble object containing all of the given distributions.
Utility functions
- qp.parameterizations.spline.spline_utils.normalize_spline(xvals: ArrayLike, yvals: ArrayLike, limits: tuple[float, float], **kwargs) ndarray[float][source]
Normalize a set of 1D interpolators
- Parameters:
- Returns:
- ynorm
np.ndarray[float] Normalized y-vals
- ynorm
- qp.parameterizations.spline.spline_utils.build_splines(xvals: ArrayLike, yvals: ArrayLike) tuple[ndarray, ndarray, ndarray][source]
Build a set of 1D spline representations
- Parameters:
- Returns:
- splx
np.ndarray Spline knot xvalues
- sply
np.ndarray Spline knot yvalues
- spln
np.ndarray Spline knot order parameters
- splx
- qp.parameterizations.spline.spline_utils.spline_extract_xy_vals(in_dist: Ensemble, **kwargs) dict[str, Any][source]
Wrapper for extract_xy_vals. Convert using a set of x and y values.
- qp.parameterizations.spline.spline_utils.extract_samples(in_dist: Ensemble, **kwargs) dict[str, np.ndarray | None][source]
Convert using a set of values sampled from the PDF
- qp.parameterizations.spline.spline_utils.build_kdes(samples: ArrayLike, **kwargs) list[gaussian_kde][source]
Build a set of Gaussian Kernel Density Estimates
- Parameters:
- samples
ArrayLike X-values used for the spline
- kwargs
Passed to the
scipy.stats.gaussian_kdeconstructor
- samples
- Returns:
- kdes
list[scipy.stats.gaussian_kde]
- kdes
- qp.parameterizations.spline.spline_utils.evaluate_kdes(xvals: ArrayLike, kdes: list[gaussian_kde]) ndarray[source]
Build a evaluate a set of kdes
- Parameters:
- xvals
ArrayLike X-values used for the spline
- kdes
list[scipy.stats.gaussian_kde] The kernel density estimates
- xvals
- Returns:
- yvals
np.ndarray The kdes evaluated at the xvals
- yvals
Packed Interpolation
- class qp.packed_interp_gen(xvals, ypacked, ymax, *args, packing_type=PackingType.linear_from_rowmax, log_floor=-3.0, **kwargs)[source]
Bases:
Pdf_rows_genInterpolator based distribution
Notes
This is a version of the interp_pdf that stores the data using a packed integer representation.
See qp.packing_utils for options on packing
See qp.interp_pdf for details on interpolation
- name = 'packed_interp'
- version = 0
- property xvals
Return the x-values used to do the interpolation
- property packing_type
Returns the packing type
- property log_floor
Returns the packing type
- property ypacked
Returns the packed y-vals
- property ymax
Returns the max for each row
- property yvals
Return the y-valus used to do the interpolation
- classmethod get_allocation_kwds(npdf, **kwargs) dict[str, tuple[tuple[int, int], str]][source]
Return the keywords necessary to create an ‘empty’ hdf5 file with npdf entries for iterative file writeout. We only need to allocate the objdata columns, as the metadata can be written when we finalize the file.
- classmethod plot_native(pdf, **kwargs)[source]
Plot the PDF in a way that is particular to this type of distibution
For a interpolated PDF this uses the interpolation points
- classmethod create_ensemble(xvals: ArrayLike, ypacked: ArrayLike, ymax: ArrayLike, packing_type=PackingType.linear_from_rowmax, log_floor=-3.0, ancil: Mapping | None = None) Ensemble[source]
Creates an Ensemble of distributions parameterized as interpolation that are stored as packed integers.
- Parameters:
- xvals
ArrayLike The x-values used to do the interpolation
- ypacked
ArrayLike The packed version of the y-values used to do the interpolation
- ymax
ArrayLike The maximum y-values for each pdf
- packing_type: PackingType
By default
PackingType.linear_from_rowmax- log_floor: float
By default -3
- ancil
Optional[Mapping], optional A dictionary of metadata for the distributions, where any arrays have the same length as the number of distributions, by default None
- xvals
- Returns:
EnsembleAn Ensemble object containing all of the given distributions.
Utility functions
Integer packing utilities for qp
- qp.parameterizations.packed_interp.packing_utils.linear_pack_from_rowmax(input_array: ArrayLike) tuple[ndarray, ndarray][source]
Pack an array into 8bit unsigned integers, using the maximum of each row as a reference
This packs the values onto a linear grid for each row, running from 0 to row_max
- Parameters:
- input_array
ArrayLike The values we are packing
- input_array
- Returns:
- packed_array
np.ndarray The packed values
- row_max
np.ndarray The max for each row, need to unpack the array
- packed_array
- qp.parameterizations.packed_interp.packing_utils.linear_unpack_from_rowmax(packed_array: ArrayLike, row_max: ArrayLike) ndarray[float][source]
Unpack an array into 8bit unsigned integers, using the maximum of each row as a reference
- Parameters:
- Returns:
- unpacked_array
np.ndarray[float] The unpacked values
- unpacked_array
- qp.parameterizations.packed_interp.packing_utils.log_pack_from_rowmax(input_array: ArrayLike, log_floor: float = -3.0) tuple[ndarray[uint8], ndarray][source]
Pack an array into 8bit unsigned integers, using the maximum of each row as a reference
This packs the values onto a log grid for each row, running from row_max / 10**log_floor to row_max
- Parameters:
- Returns:
- packed_array
np.ndarray[np.uint8] The packed values
- row_max
np.ndarray The max for each row, need to unpack the array
- packed_array
- qp.parameterizations.packed_interp.packing_utils.log_unpack_from_rowmax(packed_array: ArrayLike, row_max: ArrayLike, log_floor: float = -3.0) ndarray[source]
Unpack an array into 8bit unsigned integers, using the maximum of each row as a reference
- Parameters:
- Returns:
- unpacked_array
np.ndarray The unpacked values
- unpacked_array
- qp.parameterizations.packed_interp.packing_utils.pack_array(packing_type: PackingType, input_array: ArrayLike, **kwargs)[source]
Pack an array into 8bit unsigned integers
- Parameters:
- packing_type
PackingType Enum specifying the type of packing to use
- input_array
ArrayLike The values we are packing
- kwargs
depend on the packing type used
- packing_type
- Returns:
np.ndarrayDetails depend on packing type used
- qp.parameterizations.packed_interp.packing_utils.unpack_array(packing_type: PackingType, packed_array: ArrayLike, **kwargs)[source]
Unpack an array from 8bit unsigned integers
- Parameters:
- packing_type
PackingType Enum specifying the type of packing to use
- packed_array
ArrayLike The packed values
- kwargs
depend on the packing type used
- packing_type
- Returns:
np.ndarrayDetails depend on packing type used
Sparse Interpolation
- class qp.sparse_gen(xvals, mu, sig, dims, sparse_indices, *args, **kwargs)[source]
Bases:
interp_genSparse based distribution. The final behavior is similar to interp_gen, but the constructor takes a sparse representation to build the interpolator. Attempt to inherit from interp_gen : this is failing
Notes
This implements a qp interface to the original code SparsePz from M. Carrasco-Kind.
- name = 'sparse'
- version = 0
- classmethod get_allocation_kwds(npdf, **kwargs) dict[str, tuple[tuple[int, int], str]][source]
Return the kwds necessary to create an
emptyHDF5 file withnpdfentries for iterative write. We only need to allocate the data columns, as the metadata will be written when we finalize the file.The number of data columns is calculated based on the length or shape of the metadata,
n. For example, the number of columns isnbins-1for a histogram.- Parameters:
- npdf
int Total number of distributions that will be written out
- kwargs
The keys needed to construct the shape of the data to be written.
- npdf
- Returns:
- Raises:
ValueErrorRaises an error if xvals is not provided.
Utility functions
The original SparsePZ code to be found at https://github.com/mgckind/SparsePz This module reorganizes it for usage by DESC within qp, and is python3 compliant.
- qp.parameterizations.sparse_interp.sparse_rep.shapes2pdf(wa, ma, sa, ga, meta, cut=1e-05)[source]
return a pdf evaluated at the meta[‘xvals’] values for the given set of Voigt parameters
- qp.parameterizations.sparse_interp.sparse_rep.create_basis(metadata, cut=1e-05)[source]
create the Voigt basis matrix out of a metadata dictionary
- qp.parameterizations.sparse_interp.sparse_rep.create_voigt_basis(xvals, mu, Nmu, sigma, Nsigma, Nv, cut=1e-05)[source]
Creates a gaussian-voigt dictionary at the same resolution as the original PDF
- Parameters:
xvals (float) – the x-axis point values for the PDF
mu (float) – [min_mu, max_mu], range of mean for gaussian
Nmu (int) – Number of values between min_mu and max_mu
sigma (float) – [min_sigma, max_sigma], range of variance for gaussian
Nsigma (int) – Number of values between min_sigma and max_sigma
Nv – Number of Voigt profiles per gaussian at given position mu and sigma
cut (float) – Lower cut for gaussians
- Returns:
Dictionary as numpy array with shape (len(xvals), Nmu*Nsigma*Nv)
- Return type:
- qp.parameterizations.sparse_interp.sparse_rep.sparse_basis(dictionary, query_vec, n_basis, tolerance=None)[source]
Compute sparse representation of a vector given Dictionary (basis) for a given tolerance or number of basis. It uses Cholesky decomposition to speed the process and to solve the linear operations adapted from Rubinstein, R., Zibulevsky, M. and Elad, M., Technical Report - CS Technion, April 2008
- Parameters:
dictionary (float) – Array with all basis on each column, must has shape (len(vector), total basis) and each column must have euclidean l-2 norm equal to 1
query_vec (float) – vector of which a sparse representation is desired
n_basis (int) – number of desired basis
tolerance (float) – tolerance desired if n_basis is not needed to be fixed, must input a large number for n_basis to assure achieving tolerance
- Returns:
indices, values (2 arrays one with the position and the second with the coefficients)
- qp.parameterizations.sparse_interp.sparse_rep.combine_int(Ncoef, Nbase)[source]
combine index of base (up to 62500 bases) and value (16 bits integer with sign) in a 32 bit integer First half of word is for the value and second half for the index
- qp.parameterizations.sparse_interp.sparse_rep.get_N(longN)[source]
Extract coefficients fro the 32bits integer, Extract Ncoef and Nbase from 32 bit integer return (longN >> 16), longN & 0xffff
- Parameters:
longN (int) – input 32 bits integer
- Returns:
Ncoef, Nbase both 16 bits integer
- qp.parameterizations.sparse_interp.sparse_rep.decode_sparse_indices(indices)[source]
decode sparse indices into basis indices and weigth array
- qp.parameterizations.sparse_interp.sparse_rep.indices2shapes(sparse_indices, meta)[source]
compute the Voigt shape parameters from the sparse index
- Parameters:
- sparse_index: `np.array`
1D Array of indices for each object in the ensemble
- meta: dict
Dictionary of metadata to decode the sparse indices
- qp.parameterizations.sparse_interp.sparse_rep.build_sparse_representation(x, P, mu=None, Nmu=None, sig=None, Nsig=None, Nv=3, Nsparse=20, tol=1e-10, verbose=True)[source]
compute the sparse representation of a set of pdfs evaluated on a common x array
- qp.parameterizations.sparse_interp.sparse_rep.pdf_from_sparse(sparse_indices, A, xvals, cut=1e-05)[source]
return the array of evaluations at xvals from the sparse indices
- qp.parameterizations.sparse_interp.sparse_utils.extract_sparse_from_xy(in_dist: Ensemble, **kwargs) dict[str, Any][source]
Extract sparse representation from an xy interpolated representation
- Parameters:
- in_dist
Ensemble Input distributions
- in_dist
- Returns:
- Other Parameters:
Notes
This function will rebin to a grid more suited to the in_dist support by removing x-values corrsponding to y=0