Core functionality

Ensemble

class qp.Ensemble(the_class: Pdf_gen, data: Mapping, ancil: Mapping | None = None, method: str | None = None)[source]

An object comprised of one or more distributions with the same parameterization.

The Ensemble allows you to perform operations on the group of parameterizations as a whole. An Ensemble has three main data components, the last of which is optional:

  1. The metadata: this contains information about the parameterization, and the coordinates of the parameterization.

  2. The object data: this contains the data that is unique to each distribution, for example the values that correspond to the coordinates.

  3. The ancillary data (optional): this contains data points where there is one data point for each distribution in the ensemble. There can be many of these columns or arrays in the ancillary data table.

Parameters:
the_classPdf_gen subclass

The class to use to parameterize the distributions

dataMapping

Dictionary with data used to construct the ensemble. The keys required vary for different parameterizations.

ancilOptional[Mapping]

Dictionary with ancillary data, by default None

methodOptional[str]

The key for the creation method to use, by default None

Examples

>>> import qp
>>> import numpy as np
>>> data = {'bins': [0,1,2,3,4,5],
...         'pdfs': np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ancil = {'ids': [105, 108]}}
>>> ens = qp.Ensemble(qp.hist,data,ancil)
>>> ens.metadata
{'pdf_name': array([b'hist'], dtype='|S4'),
'pdf_version': array([0]),
'bins': array([[0, 1, 2, 3, 4, 5]])}
property gen_func

Return the function used to create the distribution object for this ensemble

property gen_class

Return the class used to generate distributions for this ensemble

property dist

Return the scipy.stats.rv_continuous object that generates distributions for this ensemble

property kwds

Return the kwds associated to the frozen object for this ensemble

property gen_obj

Return the scipy.stats.rv_continuous object that generates distributions for this ensemble

property frozen

Return the scipy.stats.rv_frozen object that encapsulates the distributions for this ensemble

property ndim: int

Return the number of dimensions of distributions in this ensemble.

property shape: tuple

Return the shape of distributions in this ensemble.

property npdf: int

Return the number of distributions in this ensemble.

property ancil: Mapping

Return the ancillary data dictionary for this ensemble.

x_samples(min: float = 0.0, max: float = 5.0, n: int | None = 1000) ndarray[float][source]

Return an array of x values that can be used to plot all the distributions in the Ensemble.

This is meant to plot the characteristic distribution for an Ensemble of discrete data. For example, for an ensemble of histograms that would be the PDF, and for an ensemble of quantiles that would be the CDF.

Analytic parameterizations like mixmod or scipy.stats.norm will just return a np.linspace(min,max,n), and it’s recommended you input the values as the defaults are the same for all analytic distributions.

Parameters:
minfloat, optional

The minimum x value to be used if the parameterization doesn’t have an x_samples method or is analytic, by default 0.

maxfloat, optional

The maximum x value to be used if the parameterization doesn’t have an x_samples method or is analytic, by default 5.

nOptional[int], optional

The number of points to be used if the parameterization doesn’t have an x_samples method or is analytic, by default 1000

Returns:
xsnp.ndarray[float]

The array of points to use.

convert_to(to_class: Pdf_gen, **kwargs: str) Ensemble[source]

Convert this ensemble to the given parameterization class. To see the available conversion methods for the your chosen parameterization and their required arguments, check the docstrings for qp.to_class. If the parameterization class doesn’t have a conversion methods table, then it will not be possible to convert to that class.

Parameters:
to_classPdf_gen subclass

Parameterization class to convert to

**kwargs

Keyword arguments that are passed to the output class constructor

Returns:
ensEnsemble

Ensemble of distributions of type class_to using the data from this object

Other Parameters:
methodstr

Optional argument to specify a non-default conversion algorithm

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]]))
>>> ens_i = ens_h.convert_to(qp.interp, xvals=np.linspace(0,5,10))
>>> ens_i.metadata
{'pdf_name': array([b'interp'], dtype='|S6'),
'pdf_version': array([0]),
'xvals': array([0.        , 0.55555556, 1.11111111, 1.66666667, 2.22222222,
2.77777778, 3.33333333, 3.88888889, 4.44444444, 5.        ]))}
update(data: Mapping, ancil: Mapping | None = None) None[source]

Update the frozen distribution object with the given data, and set the ancillary data table with ancil if given.

Parameters:
dataMapping

Dictionary with data used to construct the ensemble, including metadata.

ancilOptional[Mapping], optional

Optional dictionary that contains data for each of the distributions in the ensemble, by default None.

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_h.update(data={'bins': np.array([1,2,3,4,5]), 'pdfs': np.array([0.1,0.1,0.4,0.2])})
>>> ens_h.metadata
{'pdf_name': array([b'hist'], dtype='|S4'),
'pdf_version': array([0]),
'bins': array([[1, 2, 3, 4, 5]])}
update_objdata(data: Mapping, ancil: Mapping | None = None) None[source]

Updates the objdata in the frozen distribution, and sets the ancillary data table if given.

Parameters:
dataMapping

Dictionary with the object data that will be used to reconstruct the ensemble

ancilOptional[Mapping], optional

Optional dictionary that contains data for each of the distributions in the ensemble, by default None.

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_h.objdata
{'pdfs': array([0.   , 0.125, 0.125, 0.5  , 0.25 ])}
>>> ens_h.update_objdata(data={'pdfs': np.array([0.05,0.09,0.2,0.3,0.15])})
>>> ens_h.objdata
{'pdfs': array([[0.06329114, 0.11392405, 0.25316456, 0.37974684, 0.18987342]])}
property metadata: Mapping

Return the metadata for this ensemble. Metadata are elements that are the same for all the distributions in the ensemble. These include the name and version of the distribution generation class

Returns:
metadataMapping

The dictionary of the metadata.

property objdata: Mapping

Return the data for this ensemble. These are the elements that differ for each distribution in the ensemble. For example, the data points that correspond to each of the coordinates given in the metadata.

Returns:
objdataMapping

The object data

Notes

If the distribution normalized the data (which many do by default), this will return the normalized data and not the original input data.

set_ancil(ancil: Mapping) None[source]

Set the ancillary data dictionary. The arrays in this dictionary must have one row for each of the distributions, which means that the length of these arrays (or the first dimension) must be the same as the number of distributions in the ensemble.

Parameters:
ancilMapping

The ancillary data dictionary.

Raises:
IndexError

If the length of the arrays in ancil does not match the number of distributions in the Ensemble.

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]]))
>>> ancil = {'ids': np.array([5,7])}
>>> ens_h.set_ancil(ancil)
>>> ens_h.ancil
{'ids': array([5, 7])}
add_to_ancil(to_add: Mapping) None[source]

Add additional columns to the ancillary data dictionary. The ancil dictionary must already exist. If it does not, use set_ancil.

If any of these columns have the same name as already existing ancillary data columns, the new columns will overwrite the old ones.

Parameters:
to_addMapping

The columns to add to the ancillary data dict

Raises:
IndexError

If the length of the arrays in to_add does not match the number of distributions in the Ensembles

Examples

>>> import qp
>>> import numpy as np
>>> ancil = {'ids': np.array([5,7])}
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]]), ancil=ancil)
>>> ens_h.add_to_ancil({'means':np.array([0.2,0.25])})
>>> ens_h.ancil
{'ids': array([5, 7]), 'means': array[0.2,0.25]}
append(other_ens: Ensemble) None[source]

Append another ensemble to this ensemble. The ensembles must be of the same parameterization, or this will not work. They must also have the same metadata, so for example if they are both histograms they must also have the same bins.

Both ensembles must have an ancillary data dictionary in order for them to be appended to each other. If one ensemble has an ancillary data dictionary and the other does not, this will set the ancillary data dictionary to None.

Parameters:
other_ensEnsemble

The ensemble to append to this one.

Raises:
KeyError

Raised if the two ensembles do not have matching metadata.

Examples

>>> import qp
>>> import numpy as np
>>> ens_1 = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_2 = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0.5,0.15,0.25,0.45,0.1]))
>>> ens_1.append(ens_2)
>>> ens_1.npdf
2
build_tables(encode: bool = False, ext: str | None = None) Mapping[source]

Returns a dictionary of dictionaries of numpy arrays for the meta data, object data, and the ancillary data (if it exists) for this ensemble.

Parameters:
encodebool

If True and ext is ‘hdf5’, will encode any string columns in the ancil table, by default False.

extstr, optional

If set to ‘hdf5’ when encode is True, will encode any string columns in the ancil table, by default None.

Returns:
dataMapping, tables_io.TableDict-like

The dictionary with the data. Has the keys: meta for metadata, data for object data, and optionally ancil for ancillary data.

norm()[source]

Normalizes the input distribution data if it represents a PDF and can be normalized.

Raises:
AttributeError

Raised if the parameterization doesn’t have a normalization method.

mode(grid: ArrayLike) ArrayLike[source]

Return the mode of each ensemble distribution, evaluated on the given grid.

Parameters:
gridArrayLike

Grid on which to evaluate distribution

Returns:
modeArrayLike

The modes of the distributions evaluated on grid, with shape (npdf, 1)

gridded(grid: ArrayLike) tuple[ArrayLike, ArrayLike][source]

Build, cache and return the PDF values at the given grid points. If the given grid matches the already cached grid, then this just returns the cached value.

Parameters:
gridArrayLike

The grid points to evaluate the PDF at.

Returns:
griddedtuple [ ArrayLike, ArrayLike ]

(grid, pdf_values)

write_to(filename: str) None[source]

Write this ensemble to a file.

The file type can be any of the those supported by tables_io. File type is indicated by the suffix of the file name given. Allowed formats are: ‘hdf5’,’h5’,’hf5’,’hd5’,’fits’,’fit’,’pq’,’parq’,’parquet’

If writing to parquet files, a file will be written for the metadata, the object data, and the ancillary data if it exists, where the identifying key is added to the filename.

Parameters:
filenamestr

Examples

>>> import qp
>>> import numpy as np
>>> ens_1 = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_1.write_to("hist-ensemble.hdf5")
to_json() dict[str, str][source]

Convert this ensemble to a json string

pdf(x: ArrayLike) ArrayLike[source]

Evaluates the probability density function (PDF) for each of the distributions in the ensemble

Parameters:
xArrayLike

Location(s) at which to evaluate the PDF for each distribution.

Returns:
pdfArrayLike

The PDF value(s) at the given location(s).

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.pdf(np.linspace(3,6,6))
array([[0.5       , 0.5       , 0.25      , 0.25      , 0.        ,
        0.        ],
       [0.37974684, 0.37974684, 0.18987342, 0.18987342, 0.        ,
        0.        ]])
logpdf(x: ArrayLike) ArrayLike[source]

Evaluates the log of the probability density function (PDF) for each of the distributions in the ensemble.

Parameters:
xArrayLike

Location(s) at which to do the evaluations

Returns:
logpdfArrayLike

The log of the PDF at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.logpdf(np.linspace(3,6,6))
array([[-0.69314718, -0.69314718, -1.38629436, -1.38629436,        -inf,
       -inf],
      [-0.96825047, -0.96825047, -1.66139765, -1.66139765,        -inf,
       -inf]])
cdf(x: ArrayLike) ArrayLike[source]

Evaluates the cumulative distribution function (CDF) for each of the distributions in the ensemble.

Parameters:
xArrayLike

Location(s) at which to do the evaluations

Returns:
cdfArrayLike

The CDF at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.cdf(np.linspace(3,6,6))
array([[0.25      , 0.55      , 0.8       , 0.95      , 1.        ,
        1.        ],
       [0.43037975, 0.65822785, 0.84810127, 0.96202532, 1.        ,
        1.        ]])
logcdf(x: ArrayLike) ArrayLike[source]

Evaluates the log of the cumulative distribution function (CDF) for each of the distributions in the ensemble.

Parameters:
xArrayLike

Location(s) at which to do the evaluations

Returns:
cdfArrayLike

The log of the CDF at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.logcdf(np.linspace(3,6,6))
array([[-1.38629436, -0.597837  , -0.22314355, -0.05129329,  0.        ,
        0.        ],
       [-0.84308733, -0.41820413, -0.16475523, -0.03871451,  0.        ,
        0.        ]])
ppf(q: ArrayLike) ArrayLike[source]

Evaluates the percentage point function (PPF) for each of the distributions in the ensemble..

Parameters:
qArrayLike

Location(s) at which to do the evaluations

Returns:
ppfArrayLike

The PPF at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.ppf(0.5)
array([[3.5       ],
       [3.18333333]])
sf(q: ArrayLike) ArrayLike[source]

Evaluates the survival fraction (SF) for each of the distributions in the ensemble.

Parameters:
qArrayLike

Location(s) at which to evaluate the distributions

Returns:
sfArrayLike

The SF at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.sf(0.5)
array([[1.        ],
       [0.96835443]])
logsf(q: ArrayLike) ArrayLike[source]

Evaluates the log of the survival function (SF) for each of the distributions in the ensemble.

Parameters:
qArrayLike

Location(s) at which to evaluate the distributions

Returns:
sfArrayLike

The log of the SF at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.logsf(0.5)
array([[ 0.        ],
       [-0.03215711]])
isf(q: ArrayLike) ArrayLike[source]

Evaluates the inverse of the survival fraction (SF) for each of the distributions in the ensemble.

Parameters:
qArrayLike

Location(s) at which to evaluate the distributions

Returns:
sfArrayLike

The inverse of the survival fraction at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.isf(0.5)
array([[3.5       ],
       [3.18333333]])
rvs(size: int = 1, random_state: None | int | Generator = None) ArrayLike[source]

Generate samples from the distributions in this ensemble.

The returned samples are of shape (npdf, size), where size is the number of samples per distribution.

Parameters:
sizeint, optional

Number of samples to return, by default 1.

random_stateint, numpy.random.Generator, None, optional

The random state to use. Can be provided with a random seed for consistency. By default None.

Returns:
samplesArrayLike

The array of samples for each distribution in the ensemble, shape (npdf,size)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.rvs(size=2)
array([[3.12956247, 3.72090937],
       [4.96783836, 3.24016123]])
stats(moments: str = 'mv') tuple[ArrayLike, ...][source]

Return some statistics for each of the distributions in this ensemble.

The moments to be returned are determined by the string given to moments, where each letter represents a specific moment. The options are: “m” = mean, “v” = variance, “s” = (Fisher’s) skew, “k” = (Fisher’s) kurtosis.

Parameters:
momentsstr, optional

Which moments to include, by default “mv”

Returns:
statstuple[ArrayLike, … ]

A sequence of arrays of the moments requested, where the shape of the arrays is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.stats()
(array([[3.375     ],
        [3.01898734]]),
 array([[0.859375  ],
        [1.23698125]]))
median() ArrayLike[source]

Return the median for each of the distributions in this ensemble.

Returns:
mediansArrayLike

The median for each distribution, returns a float if there is only one distribution, or the shape of the array is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.median()
array([[3.5       ],
       [3.18333333]])
mean() ArrayLike[source]

Return the mean for each of the distributions in this ensemble.

Returns:
meansArrayLike

The mean for each distribution, returns a float if there is only one distribution, or the shape of the array is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.mean()
array([[3.375     ],
       [3.01898734]])
var() ArrayLike[source]

Return the variance for each of the distributions in this ensemble.

Returns:
variancesArrayLike

The variance for each distribution, returns a float if there is only one distribution, or the shape of the array is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.var()
array([[0.859375  ],
       [1.23698125]])
std() ArrayLike[source]

Return the standard deviation for each of the distributions in this ensemble.

Returns:
stdsArrayLike

The standard deviations for each distribution, the shape of the array is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.std()
array([[0.92702481],
       [1.11219659]])
moment(n: int) ArrayLike[source]

Return the nth moment for each of the distributions in this ensemble.

Parameters:
nint

The order of the moment

Returns:
momentsArrayLike

The nth moment for each distribution, the shape of the array is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.moment(2)
array([[12.25      ],
       [10.35126582]])
entropy() ArrayLike[source]

Return the differential entropy for each of the distributions in this ensemble.

Returns:
entropyArrayLike

The entropy for each distribution, the shape of the array is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.entropy()
array([[1.21300757],
       [1.45307405]])
interval(alpha: ArrayLike) tuple[ArrayLike, ...][source]

Return the intervals corresponding to a confidence level of alpha for each of the distributions in this ensemble.

Parameters:
alphaArrayLike

The array of values to return intervals for. These should be the probability that a random variable will be drawn from the returned range. Each value should be in the range [0,1].

Returns:
intervaltuple[ArrayLike, …]

A tuple of the arrays containing the intervals for each distribution, where the shape of the arrays is (npdf, len(alpha))

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.interval(alpha=[0,0.5,0.9])
(array([[1.4       , 3.        , 3.5       ],
        [0.79      , 2.2875    , 3.18333333]]),
 array([[3.5       , 4.        , 4.8       ],
        [3.18333333, 3.84166667, 4.73666667]]))
histogramize(bins: ArrayLike) tuple[ArrayLike][source]

Computes integrated histogram bin values for all distributions in the ensemble.

Parameters:
binsArrayLike

Array of N+1 endpoints of N bins

Returns:
histogram: tuple[ArrayLike, ArrayLike]

The first array in the tuple is the bin edges that were input. The second array in the tuple is an (npdf, N) array of the values in the bins.

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.histogramize(bins=np.array([1,2,3,4,5]))
(array([1, 2, 3, 4, 5]),
 array([[0.125     , 0.125     , 0.5       , 0.25      ],
        [0.11392405, 0.25316456, 0.37974684, 0.18987342]]))
integrate(limits: tuple[float | ArrayLike, float | ArrayLike]) ArrayLike[source]

Computes the integral under the probability distribution functions (PDFs) of the distributions in the ensemble between the given limits.

Parameters:
limitstuple[Union[float, ArrayLike], Union[float, ArrayLike]]

A tuple with the limits of integration, where the first object in the tuple is the lower limit, and the second object is the upper limit. The limit objects can be floats or arrays, where the number of limits is the length of those arrays, or nlimits.

Returns:
integral: ArrayLike

Value of the integral(s), with the shape (npdf, nlimits)

mix_mod_fit(comps=5)[source]

Fits the parameters of a given functional form to an approximation

Parameters:
compsint, optional

Number of components to consider

usingstr, optional

Which existing approximation to use, defaults to first approximation

vbbool

Report progress

Returns:
self.mix_mod: list [ qp.Composite ]

List of qp.Composite objects approximating the PDFs

Notes

Currently only supports mixture of Gaussians

moment_partial(n: int, limits: tuple, dx: float = 0.01) ArrayLike[source]

Return the nth moment over a particular range for each of the distributions in this ensemble.

Parameters:
nint

The order of the moment to return

limitstuple

The range over which to calculate the moment, where the second number is the upper limit.

dxfloat, optional

The distance between grid points when calculating, by default 0.01

Returns:
ArrayLike

Array of the moments for each of the distributions, with shape (npdf,)

plot(key: int | slice = 0, **kwargs: str)[source]

Plot the selected distribution as a curve.

Parameters:
keyint or slice, optional

The index or slice of the distribution or distributions from this ensemble to plot, by default 0.

Returns:
axesAxes

The plot axes

Other Parameters:
axesAxes

The axes to plot on. Either this or xlim must be provided.

xlim(float, float)

The x-axis limits. Either this or axes must be provided.

kwargs

Any keyword arguments to pass to matplotlib’s axes.plot() method.

plot_native(key: int | slice = 0, **kwargs: str)[source]

Plot the selected distribution in the default format for this parameterization. To find what arguments are required for specific parameterizations, you can check the docstrings of qp.[parameterization].plot_native, where [parameterization] is the parameterization class for the current ensemble.

Parameters:
keyint or slice, optional

The index or slice of the distribution or distributions from this ensemble to plot, by default 0.

kwargs

The keyword arguments to pass to the parameterization’s plot_native method.

Returns:
axesAxes

The plot axes

initializeHdf5Write(filename: str, npdf: int, comm=None) tuple[dict[str, File | Group], File][source]

Set up the output write for an ensemble, but set size to npdf rather than the size of the ensemble, as the “initial chunk” will not contain the full data

Parameters:
filenamestr

Name of the file to create

npdfint

Total number of distributions that the file will contain, usually larger then the size of the current ensemble

commMPI communicator

Optional MPI communicator to allow parallel writing

Returns:
groupdict[str, h5py.File | h5py.Group]

A dictionary of the groups to write to.

fouth5py.File

The output file object that has been created.

writeHdf5Chunk(fname: h5py.File' | 'h5py.Group, start: int, end: int) None[source]

Write a chunk of the ensemble data to file. This will write the data for the distributions in the slice from [start:end] to the file. This includes the ancillary data table.

Parameters:
fnameh5py.File | h5py.Group

The file or group object to write to

startint

Starting index of data to write in the h5py file

endint

Ending index of data to write in the h5py file

finalizeHdf5Write(filename: h5py.File' | 'h5py.Group) None[source]

Write ensemble metadata to the output file and close the file.

Parameters:
filenameh5py.File | h5py.Group

The file or group object to complete writing and close.

Factory

class qp.factory.Factory

Factory that creates and manages Ensembles of distributions.

add_class(the_class: Pdf_gen) None

Add a parameterization class to the factory dictionary, so that it is included in the set of known parameterization classes. It includes an entry both for the actual class name, which ends in _gen, and the parameterization name that is also aliased to the class.

Parameters:
the_classPdf_gen subclass

The parameterization class we are adding, which must inherit from Pdf_gen.

create(class_name: str | Pdf_gen, data: Mapping, method: str | None = None, ancil: Mapping | None = None) Ensemble

Make an Ensemble of a particular type of distribution. The data dictionary will need different keys depending on what parameterization you have chosen.

If you are unsure which keys are required, try qp.[parameterization].create_ensemble?, where [parameterization] is the class of ensemble you wish to create. This will output a docstring with the necessary inputs (and this function can also be used to create an Ensemble).

Parameters:
class_namestr or class

The name of the parameterization to make a distribution from.

dataMapping

Dictionary of values passed to the parameterization create function.

methodstr | None, optional

Used to select which creation method to invoke if there are multiple.

ancilMapping, optional

Dictionary with ancillary data, by default None

Returns:
ensEnsemble

The newly created Ensemble

Examples

>>> import qp
>>> import numpy as np
>>> data = {'bins': [0,1,2,3,4,5],
...         'pdfs': np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])}
>>> ens_h = qp.create('hist', data=data)
>>> ens.metadata
{'pdf_name': array([b'hist'], dtype='|S4'),
'pdf_version': array([0]),
'bins': array([[0, 1, 2, 3, 4, 5]])}
from_tables(tables: Mapping, decode: bool = False, ext: str | None = None) Ensemble

Build this Ensemble from a dictionary of tables, where the metadata has key meta, and the data has key data. If there is an ancillary data table, it should have the key ancil.

The function will create the ensemble with the parameterization given in the meta table, and will use any other information in the meta table necessary to figure out how to construct the ensemble (i.e. construction method).

Parameters:
tablesMapping

The dictionary of tables to turn into an Ensemble.

decodebool

If True and ext is ‘hdf5’, will decode any string type columns in ancil, by default False.

extstr, optional

If ‘hdf5’ and decode is True, will decode any string type columns in ancil, by default None.

Returns:
ensEnsemble

The ensemble constructed from the data in the tables.

Examples

>>> import qp
>>> import numpy as np
>>> meta = {'pdf_name': np.array(['hist'.encode()]), 'pdf_version': np.array([0]),
... 'bins':np.array([0,1,2,3,4,5])}
>>> data = {'pdfs': np.array([[0.  , 0.1 , 0.1 , 0.4 , 0.2 ],
... [0.05, 0.09, 0.2 , 0.3 , 0.15]])}
>>> ens = qp.from_tables({'meta': meta, 'data': data})
>>> ens.metadata
{'pdf_name': array([b'hist'], dtype='|S4'),
'pdf_version': array([0]),
'bins': array([[0, 1, 2, 3, 4, 5]])}
read_metadata(filename: str) Mapping

Read an ensemble’s metadata from a file, without loading the full data. The file must have multiple tables, one of which is called ‘meta’.

Parameters:
filenamestr

The full path to the file.

Returns:
metaMapping

Returns the metadata table as a dictionary of numpy arrays.

Examples

>>> import qp
>>> qp.read_metadata("hist-ensemble.hdf5")
{'pdf_name': array([b'hist'], dtype='|S4'),
'pdf_version': array([0]),
'bins': array([[0, 1, 2, 3, 4, 5]])}
is_qp_file(filename: str) bool

Test if a file is a qp file. Must have at least a table called ‘meta’ in the file, and that ‘meta’ table must have a property ‘pdf_name’.

Parameters:
filenamestr

Path to file to test.

Returns:
valuebool

True if the file is a qp file

Examples

>>> import qp
>>> qp.is_qp_file("test-qpfile.hdf5")
True
from_json(json_data: dict[str, str]) Ensemble

Build an Ensemble from json

Parameters:
json_datadict[str, str]

Data to json-ify

Returns:
ensEnsemble

The ensemble constructed from the data in the file.

read(filename: str, fmt: str | None = None, read_slice: slice | None = None) Ensemble

Read this ensemble from a file. The file must be a qp file.

The function will create the ensemble with the parameterization given in the metadata table, and will use any other information in the metadata table necessary to figure out how to construct the ensemble (i.e. construction method).

Parameters:
filenamestr

Path to the file.

fmtOptional[str], optional

File format, if None it will be taken from the file extension. Allowed formats are: ‘hdf5’,’h5’,’hf5’,’hd5’,’fits’,’fit’,’pq’, ‘parq’,’parquet’

read_sliceslice, optional

If provided, read only a slice of the data and ancil from the file.

Returns:
ensEnsemble

The ensemble constructed from the data in the file.

Examples

>>> import qp
>>> ens = qp.read("test-qpfile.hdf5")
data_length(filename: str) int

Get the size of data in a file. The file must be a qp file, which means it must contain an Ensemble with a metadata table.

Parameters:
filenamestr

The path to the file with the data.

Returns:
nrowsint

The length of the data, or the number of distributions in the data.

Examples

>>> import qp
>>> qp.data_length("hist-ensemble.hdf5")
2
iterator(filename: str, chunk_size: int = 100000, rank: int = 0, parallel_size: int = 1) Iterator[int, int, Ensemble]

Iterates through a given Ensemble file and yields a chunk of the ensemble data at a time. This means that the returned Ensemble contains the distributions from the returned start index to the returned stop index. If there is an ancillary data table, the Ensemble will also contain any ancillary data for those distributions.

Parameters:
filenamestr

The path to the file to iterate through.

chunk_sizeint, optional

The size of chunks to yield, by default 100_000

rankint, optional

The process rank, if run in MPI, by default 0

parallel_sizeint, optional

The number of processes, if run in MPI, by default 1

Yields:
Iterator[int, int, Ensemble]

the start index, ending index, and an Ensemble with distributions between those two indices

Raises:
TypeError

Raised if this function is run with files that are not hdf5 files.

KeyError

Raised if the pdf_name in the file is not one of the available parameterizations.

Examples

To iterate through an HDF5 Ensemble file, we can use the following code:

>>> data_file = "./test.hdf5"
>>> for start, end, ens_chunk in qp.iterator(data_file, chunk_size=11):
...     print(f"Indices are: ({start}, {end})")
...     print(ens_chunk)
Indices are: (0, 11)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (11, 22)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (22, 33)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (33, 44)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (44, 55)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (55, 66)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (66, 77)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (77, 88)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (88, 99)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (99, 100)
Ensemble(the_class=mixmod,shape=(1, 3))
convert(in_dist: Ensemble, class_name: str, **kwds) Ensemble

Convert an ensemble to a different parameterization. Keyword arguments are required to convert to a different parameterization, but the specific keyword arguments required will vary. To check the available conversion methods and their associated arguments refer to the docstrings for qp.class_name of the parameterization you are converting to. If the class does not have a conversion methods table, then it will not be possible to convert to that parameterization.

Parameters:
in_distEnsemble

The input Ensemble object to convert.

class_namestr

Name of the representation to convert to as a string

kwdsMapping

The arguments required to convert to a function of the given type.

Returns:
ensEnsemble

The ensemble we converted to

Examples

The following example demonstrates converting from a histogram parameterization to an interpolation parameterization. The arguments given will not be the same when converting between other parameterizations.

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]]))
>>> ens_i = qp.convert(ens_h, "interp", xvals=np.linspace(0,5,10))
>>> ens_i.metadata
{'pdf_name': array([b'interp'], dtype='|S6'),
'pdf_version': array([0]),
'xvals': array([0.        , 0.55555556, 1.11111111, 1.66666667, 2.22222222,
2.77777778, 3.33333333, 3.88888889, 4.44444444, 5.        ]))}
pretty_print(stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>) None

Print a level of the conversion dictionary in a human-readable format

Parameters:
streamstream

The stream to print to

static concatenate(ensembles: list[Ensemble]) Ensemble

Concatenate a list of Ensembles into one Ensemble. The Ensembles must be of the same parameterization and have the same metadata.

Parameters:
ensembleslist[Ensemble]

The list of ensembles we are concatenating

Returns:
ensEnsemble

The output

Examples

>>> import qp
>>> import numpy as np
>>> ens_1 = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_1.npdf
1
>>> ens_2 = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0.05,0.09,0.2,0.3,0.15]]))
>>> ens_2.npdf
1
>>> ens_all = qp.concatenate([ens_1, ens_2])
>>> ens_all.npdf
2
static write_dict(filename: str, ensemble_dict: Mapping[str, Ensemble], **kwargs)

Writes out a dictionary of Ensembles to an HDF5 file. Each Ensemble in the dictionary will be written to a group, and within each Ensemble group there will be subgroups for the metadata, data, and (optional) ancillary data tables.

Parameters:
filenamestr

The file path to write to.

ensemble_dictMapping[str, Ensemble]

The dictionary of Ensembles to write.

kwargs

Keyword arguments that are passed to the tables_io write_dicts_to_HDF5 function

Raises:
ValueError

Raised if the dictionary contains any values that are not Ensembles.

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_i = qp.interp.create_ensemble(xvals= np.array([0,1,2,3,4]),
... yvals = np.array([[0.05,0.09,0.2,0.3,0.15]]))
>>> qp.write_dict("qp-ensembles.hdf5",{"ens_h": ens_h, "ens_i": ens_i})
static read_dict(filename: str) Mapping[str, Ensemble]

Reads in one or more Ensembles from an HDF5 file to a dictionary of Ensembles. The file should contain one top-level group per ensemble. Each Ensemble group should have subgroups that are the metadata, data, and (optional) ancillary data tables.

Parameters:
filenamestr

The path to the HDF5 file to read in.

Returns:
Mapping[str, Ensemble]

A dictionary with the Ensembles contained in the file.

Examples

>>> import qp
>>> ens_dict = qp.read_dict("qp-ensembles.hdf5")