Core functionality

Ensemble

class qp.Ensemble(the_class: Pdf_gen, data: Mapping, ancil: Mapping | None = None, method: str | None = None)[source]

An object comprised of one or more distributions with the same parameterization.

The Ensemble allows you to perform operations on the group of parameterizations as a whole. An Ensemble has three main data components, the last of which is optional:

The metadata: this contains information about the parameterization, and the coordinates of the parameterization.
The object data: this contains the data that is unique to each distribution, for example the values that correspond to the coordinates.
The ancillary data (optional): this contains data points where there is one data point for each distribution in the ensemble. There can be many of these columns or arrays in the ancillary data table.

Parameters:

the_classPdf_gen subclass: The class to use to parameterize the distributions
dataMapping: Dictionary with data used to construct the ensemble. The keys required vary for different parameterizations.
ancilOptional[Mapping]: Dictionary with ancillary data, by default None
methodOptional[str]: The key for the creation method to use, by default None

Examples

>>> import qp
>>> import numpy as np
>>> data = {'bins': [0,1,2,3,4,5],
...         'pdfs': np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ancil = {'ids': [105, 108]}}
>>> ens = qp.Ensemble(qp.hist,data,ancil)
>>> ens.metadata
{'pdf_name': array([b'hist'], dtype='|S4'),
'pdf_version': array([0]),
'bins': array([[0, 1, 2, 3, 4, 5]])}

property gen_func: Return the function used to create the distribution object for this ensemble

property gen_class: Return the class used to generate distributions for this ensemble

property dist: Return the scipy.stats.rv_continuous object that generates distributions for this ensemble

property kwds: Return the kwds associated to the frozen object for this ensemble

property gen_obj: Return the scipy.stats.rv_continuous object that generates distributions for this ensemble

property frozen: Return the scipy.stats.rv_frozen object that encapsulates the distributions for this ensemble

property ndim: int: Return the number of dimensions of distributions in this ensemble.

property shape: tuple: Return the shape of distributions in this ensemble.

property npdf: int: Return the number of distributions in this ensemble.

property ancil: Mapping: Return the ancillary data dictionary for this ensemble.

x_samples(min: float = 0.0, max: float = 5.0, n: int | None = 1000) → ndarray[float][source]

Return an array of x values that can be used to plot all the distributions in the Ensemble.

This is meant to plot the characteristic distribution for an Ensemble of discrete data. For example, for an ensemble of histograms that would be the PDF, and for an ensemble of quantiles that would be the CDF.

Analytic parameterizations like mixmod or scipy.stats.norm will just return a np.linspace(min,max,n), and it’s recommended you input the values as the defaults are the same for all analytic distributions.

Parameters:

minfloat, optional: The minimum x value to be used if the parameterization doesn’t have an x_samples method or is analytic, by default 0.
maxfloat, optional: The maximum x value to be used if the parameterization doesn’t have an x_samples method or is analytic, by default 5.
nOptional[int], optional: The number of points to be used if the parameterization doesn’t have an x_samples method or is analytic, by default 1000

Returns:

xsnp.ndarray[float]: The array of points to use.

convert_to(to_class: Pdf_gen, **kwargs: str) → Ensemble[source]

Convert this ensemble to the given parameterization class. To see the available conversion methods for the your chosen parameterization and their required arguments, check the docstrings for qp.to_class. If the parameterization class doesn’t have a conversion methods table, then it will not be possible to convert to that class.

Parameters:

to_classPdf_gen subclass: Parameterization class to convert to
**kwargs: Keyword arguments that are passed to the output class constructor

Returns:

ensEnsemble: Ensemble of distributions of type class_to using the data from this object

Other Parameters:

methodstr: Optional argument to specify a non-default conversion algorithm

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]]))
>>> ens_i = ens_h.convert_to(qp.interp, xvals=np.linspace(0,5,10))
>>> ens_i.metadata
{'pdf_name': array([b'interp'], dtype='|S6'),
'pdf_version': array([0]),
'xvals': array([0.        , 0.55555556, 1.11111111, 1.66666667, 2.22222222,
2.77777778, 3.33333333, 3.88888889, 4.44444444, 5.        ]))}

update(data: Mapping, ancil: Mapping | None = None) → None[source]

Update the frozen distribution object with the given data, and set the ancillary data table with ancil if given.

Parameters:

dataMapping: Dictionary with data used to construct the ensemble, including metadata.
ancilOptional[Mapping], optional: Optional dictionary that contains data for each of the distributions in the ensemble, by default None.

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_h.update(data={'bins': np.array([1,2,3,4,5]), 'pdfs': np.array([0.1,0.1,0.4,0.2])})
>>> ens_h.metadata
{'pdf_name': array([b'hist'], dtype='|S4'),
'pdf_version': array([0]),
'bins': array([[1, 2, 3, 4, 5]])}

update_objdata(data: Mapping, ancil: Mapping | None = None) → None[source]

Updates the objdata in the frozen distribution, and sets the ancillary data table if given.

Parameters:

dataMapping: Dictionary with the object data that will be used to reconstruct the ensemble
ancilOptional[Mapping], optional: Optional dictionary that contains data for each of the distributions in the ensemble, by default None.

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_h.objdata
{'pdfs': array([0.   , 0.125, 0.125, 0.5  , 0.25 ])}
>>> ens_h.update_objdata(data={'pdfs': np.array([0.05,0.09,0.2,0.3,0.15])})
>>> ens_h.objdata
{'pdfs': array([[0.06329114, 0.11392405, 0.25316456, 0.37974684, 0.18987342]])}

property metadata: Mapping

Return the metadata for this ensemble. Metadata are elements that are the same for all the distributions in the ensemble. These include the name and version of the distribution generation class

Returns:

metadataMapping: The dictionary of the metadata.

property objdata: Mapping

Return the data for this ensemble. These are the elements that differ for each distribution in the ensemble. For example, the data points that correspond to each of the coordinates given in the metadata.

Returns:

objdataMapping: The object data

Notes

If the distribution normalized the data (which many do by default), this will return the normalized data and not the original input data.

set_ancil(ancil: Mapping) → None[source]

Set the ancillary data dictionary. The arrays in this dictionary must have one row for each of the distributions, which means that the length of these arrays (or the first dimension) must be the same as the number of distributions in the ensemble.

Parameters:

ancilMapping: The ancillary data dictionary.

Raises:

IndexError: If the length of the arrays in ancil does not match the number of distributions in the Ensemble.

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]]))
>>> ancil = {'ids': np.array([5,7])}
>>> ens_h.set_ancil(ancil)
>>> ens_h.ancil
{'ids': array([5, 7])}

add_to_ancil(to_add: Mapping) → None[source]

Add additional columns to the ancillary data dictionary. The ancil dictionary must already exist. If it does not, use set_ancil.

If any of these columns have the same name as already existing ancillary data columns, the new columns will overwrite the old ones.

Parameters:

to_addMapping: The columns to add to the ancillary data dict

Raises:

IndexError: If the length of the arrays in to_add does not match the number of distributions in the Ensembles

Examples

>>> import qp
>>> import numpy as np
>>> ancil = {'ids': np.array([5,7])}
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]]), ancil=ancil)
>>> ens_h.add_to_ancil({'means':np.array([0.2,0.25])})
>>> ens_h.ancil
{'ids': array([5, 7]), 'means': array[0.2,0.25]}

append(other_ens: Ensemble) → None[source]

Append another ensemble to this ensemble. The ensembles must be of the same parameterization, or this will not work. They must also have the same metadata, so for example if they are both histograms they must also have the same bins.

Both ensembles must have an ancillary data dictionary in order for them to be appended to each other. If one ensemble has an ancillary data dictionary and the other does not, this will set the ancillary data dictionary to None.

Parameters:

other_ensEnsemble: The ensemble to append to this one.

Raises:

KeyError: Raised if the two ensembles do not have matching metadata.

Examples

>>> import qp
>>> import numpy as np
>>> ens_1 = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_2 = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0.5,0.15,0.25,0.45,0.1]))
>>> ens_1.append(ens_2)
>>> ens_1.npdf
2

build_tables(encode: bool = False, ext: str | None = None) → Mapping[source]

Returns a dictionary of dictionaries of numpy arrays for the meta data, object data, and the ancillary data (if it exists) for this ensemble.

Parameters:

encodebool: If True and ext is ‘hdf5’, will encode any string columns in the ancil table, by default False.
extstr, optional: If set to ‘hdf5’ when encode is True, will encode any string columns in the ancil table, by default None.

Returns:

dataMapping, tables_io.TableDict-like: The dictionary with the data. Has the keys: meta for metadata, data for object data, and optionally ancil for ancillary data.

norm()[source]

Normalizes the input distribution data if it represents a PDF and can be normalized.

Raises:

AttributeError: Raised if the parameterization doesn’t have a normalization method.

mode(grid: ArrayLike) → ArrayLike[source]

Return the mode of each ensemble distribution, evaluated on the given grid.

Parameters:

gridArrayLike: Grid on which to evaluate distribution

Returns:

modeArrayLike: The modes of the distributions evaluated on grid, with shape (npdf, 1)

gridded(grid: ArrayLike) → tuple[ArrayLike, ArrayLike][source]

Build, cache and return the PDF values at the given grid points. If the given grid matches the already cached grid, then this just returns the cached value.

Parameters:

gridArrayLike: The grid points to evaluate the PDF at.

Returns:

griddedtuple [ ArrayLike, ArrayLike ]: (grid, pdf_values)

write_to(filename: str) → None[source]

Write this ensemble to a file.

The file type can be any of the those supported by tables_io. File type is indicated by the suffix of the file name given. Allowed formats are: ‘hdf5’,’h5’,’hf5’,’hd5’,’fits’,’fit’,’pq’,’parq’,’parquet’

If writing to parquet files, a file will be written for the metadata, the object data, and the ancillary data if it exists, where the identifying key is added to the filename.

Parameters:

filenamestr

Examples

>>> import qp
>>> import numpy as np
>>> ens_1 = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_1.write_to("hist-ensemble.hdf5")

to_json() → dict[str, str][source]: Convert this ensemble to a json string

pdf(x: ArrayLike) → ArrayLike[source]

Evaluates the probability density function (PDF) for each of the distributions in the ensemble

Parameters:

xArrayLike: Location(s) at which to evaluate the PDF for each distribution.

Returns:

pdfArrayLike: The PDF value(s) at the given location(s).

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.pdf(np.linspace(3,6,6))
array([[0.5       , 0.5       , 0.25      , 0.25      , 0.        ,
        0.        ],
       [0.37974684, 0.37974684, 0.18987342, 0.18987342, 0.        ,
        0.        ]])

logpdf(x: ArrayLike) → ArrayLike[source]

Evaluates the log of the probability density function (PDF) for each of the distributions in the ensemble.

Parameters:

xArrayLike: Location(s) at which to do the evaluations

Returns:

logpdfArrayLike: The log of the PDF at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.logpdf(np.linspace(3,6,6))
array([[-0.69314718, -0.69314718, -1.38629436, -1.38629436,        -inf,
       -inf],
      [-0.96825047, -0.96825047, -1.66139765, -1.66139765,        -inf,
       -inf]])

cdf(x: ArrayLike) → ArrayLike[source]

Evaluates the cumulative distribution function (CDF) for each of the distributions in the ensemble.

Parameters:

xArrayLike: Location(s) at which to do the evaluations

Returns:

cdfArrayLike: The CDF at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.cdf(np.linspace(3,6,6))
array([[0.25      , 0.55      , 0.8       , 0.95      , 1.        ,
        1.        ],
       [0.43037975, 0.65822785, 0.84810127, 0.96202532, 1.        ,
        1.        ]])

logcdf(x: ArrayLike) → ArrayLike[source]

Evaluates the log of the cumulative distribution function (CDF) for each of the distributions in the ensemble.

Parameters:

xArrayLike: Location(s) at which to do the evaluations

Returns:

cdfArrayLike: The log of the CDF at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.logcdf(np.linspace(3,6,6))
array([[-1.38629436, -0.597837  , -0.22314355, -0.05129329,  0.        ,
        0.        ],
       [-0.84308733, -0.41820413, -0.16475523, -0.03871451,  0.        ,
        0.        ]])

ppf(q: ArrayLike) → ArrayLike[source]

Evaluates the percentage point function (PPF) for each of the distributions in the ensemble..

Parameters:

qArrayLike: Location(s) at which to do the evaluations

Returns:

ppfArrayLike: The PPF at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.ppf(0.5)
array([[3.5       ],
       [3.18333333]])

sf(q: ArrayLike) → ArrayLike[source]

Evaluates the survival fraction (SF) for each of the distributions in the ensemble.

Parameters:

qArrayLike: Location(s) at which to evaluate the distributions

Returns:

sfArrayLike: The SF at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.sf(0.5)
array([[1.        ],
       [0.96835443]])

logsf(q: ArrayLike) → ArrayLike[source]

Evaluates the log of the survival function (SF) for each of the distributions in the ensemble.

Parameters:

qArrayLike: Location(s) at which to evaluate the distributions

Returns:

sfArrayLike: The log of the SF at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.logsf(0.5)
array([[ 0.        ],
       [-0.03215711]])

isf(q: ArrayLike) → ArrayLike[source]

Evaluates the inverse of the survival fraction (SF) for each of the distributions in the ensemble.

Parameters:

qArrayLike: Location(s) at which to evaluate the distributions

Returns:

sfArrayLike: The inverse of the survival fraction at the given location(s)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.isf(0.5)
array([[3.5       ],
       [3.18333333]])

rvs(size: int = 1, random_state: None | int | Generator = None) → ArrayLike[source]

Generate samples from the distributions in this ensemble.

The returned samples are of shape (npdf, size), where size is the number of samples per distribution.

Parameters:

sizeint, optional: Number of samples to return, by default 1.
random_stateint, numpy.random.Generator, None, optional: The random state to use. Can be provided with a random seed for consistency. By default None.

Returns:

samplesArrayLike: The array of samples for each distribution in the ensemble, shape (npdf,size)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.rvs(size=2)
array([[3.12956247, 3.72090937],
       [4.96783836, 3.24016123]])

stats(moments: str = 'mv') → tuple[ArrayLike, ...][source]

Return some statistics for each of the distributions in this ensemble.

The moments to be returned are determined by the string given to moments, where each letter represents a specific moment. The options are: “m” = mean, “v” = variance, “s” = (Fisher’s) skew, “k” = (Fisher’s) kurtosis.

Parameters:

momentsstr, optional: Which moments to include, by default “mv”

Returns:

statstuple[ArrayLike, … ]: A sequence of arrays of the moments requested, where the shape of the arrays is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.stats()
(array([[3.375     ],
        [3.01898734]]),
 array([[0.859375  ],
        [1.23698125]]))

median() → ArrayLike[source]

Return the median for each of the distributions in this ensemble.

Returns:

mediansArrayLike: The median for each distribution, returns a float if there is only one distribution, or the shape of the array is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.median()
array([[3.5       ],
       [3.18333333]])

mean() → ArrayLike[source]

Return the mean for each of the distributions in this ensemble.

Returns:

meansArrayLike: The mean for each distribution, returns a float if there is only one distribution, or the shape of the array is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.mean()
array([[3.375     ],
       [3.01898734]])

var() → ArrayLike[source]

Return the variance for each of the distributions in this ensemble.

Returns:

variancesArrayLike: The variance for each distribution, returns a float if there is only one distribution, or the shape of the array is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.var()
array([[0.859375  ],
       [1.23698125]])

std() → ArrayLike[source]

Return the standard deviation for each of the distributions in this ensemble.

Returns:

stdsArrayLike: The standard deviations for each distribution, the shape of the array is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.std()
array([[0.92702481],
       [1.11219659]])

moment(n: int) → ArrayLike[source]

Return the nth moment for each of the distributions in this ensemble.

Parameters:

nint: The order of the moment

Returns:

momentsArrayLike: The nth moment for each distribution, the shape of the array is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.moment(2)
array([[12.25      ],
       [10.35126582]])

entropy() → ArrayLike[source]

Return the differential entropy for each of the distributions in this ensemble.

Returns:

entropyArrayLike: The entropy for each distribution, the shape of the array is (npdf, 1)

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.entropy()
array([[1.21300757],
       [1.45307405]])

interval(alpha: ArrayLike) → tuple[ArrayLike, ...][source]

Return the intervals corresponding to a confidence level of alpha for each of the distributions in this ensemble.

Parameters:

alphaArrayLike: The array of values to return intervals for. These should be the probability that a random variable will be drawn from the returned range. Each value should be in the range [0,1].

Returns:

intervaltuple[ArrayLike, …]: A tuple of the arrays containing the intervals for each distribution, where the shape of the arrays is (npdf, len(alpha))

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.interval(alpha=[0,0.5,0.9])
(array([[1.4       , 3.        , 3.5       ],
        [0.79      , 2.2875    , 3.18333333]]),
 array([[3.5       , 4.        , 4.8       ],
        [3.18333333, 3.84166667, 4.73666667]]))

histogramize(bins: ArrayLike) → tuple[ArrayLike][source]

Computes integrated histogram bin values for all distributions in the ensemble.

Parameters:

binsArrayLike: Array of N+1 endpoints of N bins

Returns:

histogram: tuple[ArrayLike, ArrayLike]: The first array in the tuple is the bin edges that were input. The second array in the tuple is an (npdf, N) array of the values in the bins.

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])
>>> ens_h.histogramize(bins=np.array([1,2,3,4,5]))
(array([1, 2, 3, 4, 5]),
 array([[0.125     , 0.125     , 0.5       , 0.25      ],
        [0.11392405, 0.25316456, 0.37974684, 0.18987342]]))

integrate(limits: tuple[float | ArrayLike, float | ArrayLike]) → ArrayLike[source]

Computes the integral under the probability distribution functions (PDFs) of the distributions in the ensemble between the given limits.

Parameters:

limitstuple[Union[float, ArrayLike], Union[float, ArrayLike]]: A tuple with the limits of integration, where the first object in the tuple is the lower limit, and the second object is the upper limit. The limit objects can be floats or arrays, where the number of limits is the length of those arrays, or nlimits.

Returns:

integral: ArrayLike: Value of the integral(s), with the shape (npdf, nlimits)

mix_mod_fit(comps=5)[source]

Fits the parameters of a given functional form to an approximation

Parameters:

compsint, optional: Number of components to consider
usingstr, optional: Which existing approximation to use, defaults to first approximation
vbbool: Report progress

Returns:

self.mix_mod: list [ qp.Composite ]: List of qp.Composite objects approximating the PDFs

Notes

Currently only supports mixture of Gaussians

moment_partial(n: int, limits: tuple, dx: float = 0.01) → ArrayLike[source]

Return the nth moment over a particular range for each of the distributions in this ensemble.

Parameters:

nint: The order of the moment to return
limitstuple: The range over which to calculate the moment, where the second number is the upper limit.
dxfloat, optional: The distance between grid points when calculating, by default 0.01

Returns:

ArrayLike: Array of the moments for each of the distributions, with shape (npdf,)

plot(key: int | slice = 0, **kwargs: str)[source]

Plot the selected distribution as a curve.

Parameters:

keyint or slice, optional: The index or slice of the distribution or distributions from this ensemble to plot, by default 0.

Returns:

axesAxes: The plot axes

Other Parameters:

axesAxes: The axes to plot on. Either this or xlim must be provided.
xlim(float, float): The x-axis limits. Either this or axes must be provided.
kwargs: Any keyword arguments to pass to matplotlib’s axes.plot() method.

plot_native(key: int | slice = 0, **kwargs: str)[source]

Plot the selected distribution in the default format for this parameterization. To find what arguments are required for specific parameterizations, you can check the docstrings of qp.[parameterization].plot_native, where [parameterization] is the parameterization class for the current ensemble.

Parameters:

keyint or slice, optional: The index or slice of the distribution or distributions from this ensemble to plot, by default 0.
kwargs: The keyword arguments to pass to the parameterization’s plot_native method.

Returns:

axesAxes: The plot axes

initializeHdf5Write(filename: str, npdf: int, comm=None) → tuple[dict[str, File | Group], File][source]

Set up the output write for an ensemble, but set size to npdf rather than the size of the ensemble, as the “initial chunk” will not contain the full data

Parameters:

filenamestr: Name of the file to create
npdfint: Total number of distributions that the file will contain, usually larger then the size of the current ensemble
commMPI communicator: Optional MPI communicator to allow parallel writing

Returns:

groupdict[str, h5py.File | h5py.Group]: A dictionary of the groups to write to.
fouth5py.File: The output file object that has been created.

writeHdf5Chunk(fname: h5py.File' | 'h5py.Group, start: int, end: int) → None[source]

Write a chunk of the ensemble data to file. This will write the data for the distributions in the slice from [start:end] to the file. This includes the ancillary data table.

Parameters:

fnameh5py.File | h5py.Group: The file or group object to write to
startint: Starting index of data to write in the h5py file
endint: Ending index of data to write in the h5py file

finalizeHdf5Write(filename: h5py.File' | 'h5py.Group) → None[source]

Write ensemble metadata to the output file and close the file.

Parameters:

filenameh5py.File | h5py.Group: The file or group object to complete writing and close.

Factory

class qp.factory.Factory

Factory that creates and manages Ensembles of distributions.

add_class(the_class: Pdf_gen) → None

Add a parameterization class to the factory dictionary, so that it is included in the set of known parameterization classes. It includes an entry both for the actual class name, which ends in _gen, and the parameterization name that is also aliased to the class.

Parameters:

the_classPdf_gen subclass: The parameterization class we are adding, which must inherit from Pdf_gen.

create(class_name: str | Pdf_gen, data: Mapping, method: str | None = None, ancil: Mapping | None = None) → Ensemble

Make an Ensemble of a particular type of distribution. The data dictionary will need different keys depending on what parameterization you have chosen.

If you are unsure which keys are required, try qp.[parameterization].create_ensemble?, where [parameterization] is the class of ensemble you wish to create. This will output a docstring with the necessary inputs (and this function can also be used to create an Ensemble).

Parameters:

class_namestr or class: The name of the parameterization to make a distribution from.
dataMapping: Dictionary of values passed to the parameterization create function.
methodstr | None, optional: Used to select which creation method to invoke if there are multiple.
ancilMapping, optional: Dictionary with ancillary data, by default None

Returns:

ensEnsemble: The newly created Ensemble

Examples

>>> import qp
>>> import numpy as np
>>> data = {'bins': [0,1,2,3,4,5],
...         'pdfs': np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]])}
>>> ens_h = qp.create('hist', data=data)
>>> ens.metadata
{'pdf_name': array([b'hist'], dtype='|S4'),
'pdf_version': array([0]),
'bins': array([[0, 1, 2, 3, 4, 5]])}

from_tables(tables: Mapping, decode: bool = False, ext: str | None = None) → Ensemble

Build this Ensemble from a dictionary of tables, where the metadata has key meta, and the data has key data. If there is an ancillary data table, it should have the key ancil.

The function will create the ensemble with the parameterization given in the meta table, and will use any other information in the meta table necessary to figure out how to construct the ensemble (i.e. construction method).

Parameters:

tablesMapping: The dictionary of tables to turn into an Ensemble.
decodebool: If True and ext is ‘hdf5’, will decode any string type columns in ancil, by default False.
extstr, optional: If ‘hdf5’ and decode is True, will decode any string type columns in ancil, by default None.

Returns:

ensEnsemble: The ensemble constructed from the data in the tables.

Examples

>>> import qp
>>> import numpy as np
>>> meta = {'pdf_name': np.array(['hist'.encode()]), 'pdf_version': np.array([0]),
... 'bins':np.array([0,1,2,3,4,5])}
>>> data = {'pdfs': np.array([[0.  , 0.1 , 0.1 , 0.4 , 0.2 ],
... [0.05, 0.09, 0.2 , 0.3 , 0.15]])}
>>> ens = qp.from_tables({'meta': meta, 'data': data})
>>> ens.metadata
{'pdf_name': array([b'hist'], dtype='|S4'),
'pdf_version': array([0]),
'bins': array([[0, 1, 2, 3, 4, 5]])}

read_metadata(filename: str) → Mapping

Read an ensemble’s metadata from a file, without loading the full data. The file must have multiple tables, one of which is called ‘meta’.

Parameters:

filenamestr: The full path to the file.

Returns:

metaMapping: Returns the metadata table as a dictionary of numpy arrays.

Examples

>>> import qp
>>> qp.read_metadata("hist-ensemble.hdf5")
{'pdf_name': array([b'hist'], dtype='|S4'),
'pdf_version': array([0]),
'bins': array([[0, 1, 2, 3, 4, 5]])}

is_qp_file(filename: str) → bool

Test if a file is a qp file. Must have at least a table called ‘meta’ in the file, and that ‘meta’ table must have a property ‘pdf_name’.

Parameters:

filenamestr: Path to file to test.

Returns:

valuebool: True if the file is a qp file

Examples

>>> import qp
>>> qp.is_qp_file("test-qpfile.hdf5")
True

from_json(json_data: dict[str, str]) → Ensemble

Build an Ensemble from json

Parameters:

json_datadict[str, str]: Data to json-ify

Returns:

ensEnsemble: The ensemble constructed from the data in the file.

read(filename: str, fmt: str | None = None, read_slice: slice | None = None) → Ensemble

Read this ensemble from a file. The file must be a qp file.

The function will create the ensemble with the parameterization given in the metadata table, and will use any other information in the metadata table necessary to figure out how to construct the ensemble (i.e. construction method).

Parameters:

filenamestr: Path to the file.
fmtOptional[str], optional: File format, if None it will be taken from the file extension. Allowed formats are: ‘hdf5’,’h5’,’hf5’,’hd5’,’fits’,’fit’,’pq’, ‘parq’,’parquet’
read_sliceslice, optional: If provided, read only a slice of the data and ancil from the file.

Returns:

ensEnsemble: The ensemble constructed from the data in the file.

Examples

>>> import qp
>>> ens = qp.read("test-qpfile.hdf5")

data_length(filename: str) → int

Get the size of data in a file. The file must be a qp file, which means it must contain an Ensemble with a metadata table.

Parameters:

filenamestr: The path to the file with the data.

Returns:

nrowsint: The length of the data, or the number of distributions in the data.

Examples

>>> import qp
>>> qp.data_length("hist-ensemble.hdf5")
2

iterator(filename: str, chunk_size: int = 100000, rank: int = 0, parallel_size: int = 1) → Iterator[int, int, Ensemble]

Iterates through a given Ensemble file and yields a chunk of the ensemble data at a time. This means that the returned Ensemble contains the distributions from the returned start index to the returned stop index. If there is an ancillary data table, the Ensemble will also contain any ancillary data for those distributions.

Parameters:

filenamestr: The path to the file to iterate through.
chunk_sizeint, optional: The size of chunks to yield, by default 100_000
rankint, optional: The process rank, if run in MPI, by default 0
parallel_sizeint, optional: The number of processes, if run in MPI, by default 1

Yields:

Iterator[int, int, Ensemble]: the start index, ending index, and an Ensemble with distributions between those two indices

Raises:

TypeError: Raised if this function is run with files that are not hdf5 files.
KeyError: Raised if the pdf_name in the file is not one of the available parameterizations.

Examples

To iterate through an HDF5 Ensemble file, we can use the following code:

>>> data_file = "./test.hdf5"
>>> for start, end, ens_chunk in qp.iterator(data_file, chunk_size=11):
...     print(f"Indices are: ({start}, {end})")
...     print(ens_chunk)
Indices are: (0, 11)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (11, 22)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (22, 33)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (33, 44)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (44, 55)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (55, 66)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (66, 77)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (77, 88)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (88, 99)
Ensemble(the_class=mixmod,shape=(11, 3))
Indices are: (99, 100)
Ensemble(the_class=mixmod,shape=(1, 3))

convert(in_dist: Ensemble, class_name: str, **kwds) → Ensemble

Convert an ensemble to a different parameterization. Keyword arguments are required to convert to a different parameterization, but the specific keyword arguments required will vary. To check the available conversion methods and their associated arguments refer to the docstrings for qp.class_name of the parameterization you are converting to. If the class does not have a conversion methods table, then it will not be possible to convert to that parameterization.

Parameters:

in_distEnsemble: The input Ensemble object to convert.
class_namestr: Name of the representation to convert to as a string
kwdsMapping: The arguments required to convert to a function of the given type.

Returns:

ensEnsemble: The ensemble we converted to

Examples

The following example demonstrates converting from a histogram parameterization to an interpolation parameterization. The arguments given will not be the same when converting between other parameterizations.

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0,0.1,0.1,0.4,0.2],[0.05,0.09,0.2,0.3,0.15]]))
>>> ens_i = qp.convert(ens_h, "interp", xvals=np.linspace(0,5,10))
>>> ens_i.metadata
{'pdf_name': array([b'interp'], dtype='|S6'),
'pdf_version': array([0]),
'xvals': array([0.        , 0.55555556, 1.11111111, 1.66666667, 2.22222222,
2.77777778, 3.33333333, 3.88888889, 4.44444444, 5.        ]))}

pretty_print(stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>) → None

Print a level of the conversion dictionary in a human-readable format

Parameters:

streamstream: The stream to print to

static concatenate(ensembles: list[Ensemble]) → Ensemble

Concatenate a list of Ensembles into one Ensemble. The Ensembles must be of the same parameterization and have the same metadata.

Parameters:

ensembleslist[Ensemble]: The list of ensembles we are concatenating

Returns:

ensEnsemble: The output

Examples

>>> import qp
>>> import numpy as np
>>> ens_1 = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_1.npdf
1
>>> ens_2 = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([[0.05,0.09,0.2,0.3,0.15]]))
>>> ens_2.npdf
1
>>> ens_all = qp.concatenate([ens_1, ens_2])
>>> ens_all.npdf
2

static write_dict(filename: str, ensemble_dict: Mapping[str, Ensemble], **kwargs)

Writes out a dictionary of Ensembles to an HDF5 file. Each Ensemble in the dictionary will be written to a group, and within each Ensemble group there will be subgroups for the metadata, data, and (optional) ancillary data tables.

Parameters:

filenamestr: The file path to write to.
ensemble_dictMapping[str, Ensemble]: The dictionary of Ensembles to write.
kwargs: Keyword arguments that are passed to the tables_io write_dicts_to_HDF5 function

Raises:

ValueError: Raised if the dictionary contains any values that are not Ensembles.

Examples

>>> import qp
>>> import numpy as np
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_i = qp.interp.create_ensemble(xvals= np.array([0,1,2,3,4]),
... yvals = np.array([[0.05,0.09,0.2,0.3,0.15]]))
>>> qp.write_dict("qp-ensembles.hdf5",{"ens_h": ens_h, "ens_i": ens_i})

static read_dict(filename: str) → Mapping[str, Ensemble]

Reads in one or more Ensembles from an HDF5 file to a dictionary of Ensembles. The file should contain one top-level group per ensemble. Each Ensemble group should have subgroups that are the metadata, data, and (optional) ancillary data tables.

Parameters:

filenamestr: The path to the HDF5 file to read in.

Returns:

Mapping[str, Ensemble]: A dictionary with the Ensembles contained in the file.

Examples

>>> import qp
>>> ens_dict = qp.read_dict("qp-ensembles.hdf5")