Data manipulation

This notebook covers use cases of accessing and manipulating the data contained in Ensembles and Ensemble files.

Exploring the structure of an Ensemble file

This tutorial notebook does an in-depth look of what the actual data structure of an Ensemble file looks like, and how to create one from data tables: Exploring a qp file (download here).

Accessing Ensemble data

Below are examples of how to access the relevant metadata and data coordinates for each of the supported parameterizations, as well as the normal parameterization.

Accessing the bins and pdf values of a histogram Ensemble

The bin edges are common to all distributions, so they are found in the qp.Ensemble.metadata dictionary. The bin values (‘pdfs’) are unique to each distribution, and so they are found in the qp.Ensemble.objdata dictionary.

>>> ens_h.metadata["bins"]
array([-1.  , -0.76, -0.52, -0.28, -0.04,  0.2 ,  0.44,  0.68,  0.92,
        1.16,  1.4 ,  1.64,  1.88,  2.12,  2.36,  2.6 ,  2.84,  3.08,
        3.32,  3.56,  3.8 ,  4.04,  4.28,  4.52,  4.76,  5.  ])
>>> ens_h.objdata["pdfs"]
array([[9.32923960e-18, 1.97897967e-13, 9.45298950e-10, 1.02855210e-06,
        2.60941589e-04, 1.61237560e-02, 2.60412136e-01, 1.20404896e+00,
        1.73585762e+00, 8.19029064e-01, 1.25107478e-01, 5.75357860e-03,
        7.18755383e-05, 2.23188835e-07, 1.62884983e-10, 2.68303898e-14,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00],
       [1.14354160e-03, 4.21698632e-03, 1.20328160e-02, 3.02614760e-02,
        6.71100840e-02, 1.31303762e-01, 2.26760990e-01, 3.45825540e-01,
        4.65925752e-01, 5.54741107e-01, 5.83826084e-01, 5.43198413e-01,
        4.46818319e-01, 3.24914337e-01, 2.08830620e-01, 1.18600647e-01,
        5.94965882e-02, 2.63527299e-02, 1.03011111e-02, 3.55182656e-03,
        1.07971268e-03, 2.89224138e-04, 6.82367667e-05, 1.41728752e-05,
        2.59038348e-06],
       [1.25202752e-02, 2.55094001e-02, 4.02000834e-02, 6.06476803e-02,
        8.75985090e-02, 1.21145507e-01, 1.60426590e-01, 2.03437637e-01,
        2.47056497e-01, 2.87337054e-01, 3.20061620e-01, 3.41455402e-01,
        3.48899207e-01, 3.41455402e-01, 3.20061620e-01, 2.87337054e-01,
        2.47056497e-01, 2.03437637e-01, 1.60426590e-01, 1.21145507e-01,
        8.75985090e-02, 6.06476803e-02, 4.02000834e-02, 2.55094001e-02,
        1.54952237e-02]])

Accessing the x and y values of an interpolation Ensemble

The \(x\) values are shared across all distributions, so they are found in the qp.Ensemble.metadata dictionary. The \(y\) values are unique to each distribution, so they are found in the qp.Ensemble.objdata dictionary.

>>> ens.metadata["xvals"]
array([-1.        , -0.87755102, -0.75510204, -0.63265306, -0.51020408,
       -0.3877551 , -0.26530612, -0.14285714, -0.02040816,  0.10204082,
        0.2244898 ,  0.34693878,  0.46938776,  0.59183673,  0.71428571,
        0.83673469,  0.95918367,  1.08163265,  1.20408163,  1.32653061,
        1.44897959,  1.57142857,  1.69387755,  1.81632653,  1.93877551,
        2.06122449,  2.18367347,  2.30612245,  2.42857143,  2.55102041,
        2.67346939,  2.79591837,  2.91836735,  3.04081633,  3.16326531,
        3.28571429,  3.40816327,  3.53061224,  3.65306122,  3.7755102 ,
        3.89795918,  4.02040816,  4.14285714,  4.26530612,  4.3877551 ,
        4.51020408,  4.63265306,  4.75510204,  4.87755102,  5.        ])
>>> ens.objdata["yvals"]
array([[3.84729931e-22, 1.45447239e-19, 3.77974403e-17, 6.75191079e-15,
        8.29083747e-13, 6.99805751e-11, 4.06035506e-09, 1.61941410e-07,
        4.43975721e-06, 8.36696446e-05, 1.08388705e-03, 9.65178258e-03,
        5.90797206e-02, 2.48586039e-01, 7.18989314e-01, 1.42947162e+00,
        1.95360176e+00, 1.83528678e+00, 1.18516614e+00, 5.26092288e-01,
        1.60528458e-01, 3.36704974e-02, 4.85461093e-03, 4.81134752e-04,
        3.27782998e-05, 1.53501817e-06, 4.94137729e-08, 1.09342729e-09,
        1.66317981e-11, 1.73898526e-13, 1.24985604e-15, 6.17492215e-18,
        2.09705770e-20, 4.89549569e-23, 7.85579903e-26, 8.66545669e-29,
        6.57052311e-32, 3.42464720e-35, 1.22698465e-38, 3.02182853e-42,
        5.11573344e-46, 5.95324006e-50, 4.76218523e-54, 2.61858437e-58,
        9.89769939e-63, 2.57163511e-67, 4.59295122e-72, 5.63873544e-77,
...
        2.54521656e-01, 2.32452633e-01, 2.09903856e-01, 1.87405606e-01,
        1.65432547e-01, 1.44389480e-01, 1.24602387e-01, 1.06314718e-01,
        8.96884739e-02, 7.48093867e-02, 6.16952559e-02, 5.03064490e-02,
        4.05575555e-02, 3.23292847e-02, 2.54798365e-02, 1.98551602e-02,
        1.52977080e-02, 1.16534779e-02]])

Accessing the x and y values of an irregular interpolation Ensemble

The \(x\) and \(y\) values are unique to each distribution in an irregular interpolation Ensemble, so they are both found in the qp.Ensemble.objdata dictionary.

>>> ens_irr.objdata["xvals"]
array([[0.  , 0.25, 0.5 , 0.75, 1.  ],
       [1.  , 1.25, 1.5 , 1.75, 2.  ]])
>>> ens_irr.objdata["yvals"]
array([[1.93480833, 0.17750059, 1.88439427, 0.68718763, 1.25091751],
       [0.62741694, 1.04245176, 0.81553133, 0.76474132, 1.37727559]])

Accessing the quantiles and locations of a quantile Ensemble

The quantiles are common to all distributions, so these are found in the qp.Ensemble.metadata dictionary. The locations are specific to each distribution, so are found in the qp.Ensemble.objdata dictionary.

>>> ens_q.metadata["quants"]
array([0.   , 0.111, 0.222, 0.333, 0.444, 0.555, 0.666, 0.777, 0.888,
       0.999, 1.   ])
>>> ens_q.locs["locs"]
array([2.        , 2.24254695, 2.35428008, 2.44998068, 2.5417504 ,
       2.63627077, 2.74047765, 2.86619383, 3.04624481, 3.85846109,
       3.86577836])

Accessing the means, standard deviations, and weights of a Gaussian mixture model Ensemble

Since each distribution has its own mean, standard deviation and weight, these values are all found in the qp.Ensemble.objdata dictionary:

>>> ens_m.objdata["means"]
array([-1. , -0.5,  0. ,  0.5,  1. ])
>>> ens_m.objdata["stds"]
array([0.1  , 0.275, 0.45 , 0.625, 0.8  ])
>>> ens_m.objdata["weights"]
array([0.1, 0.3, 0.2, 0.2, 0.2])

Accessing the mean and standard deviation of a normal Ensemble

Since each distribution has its own mean and standard deviation, these values are all found in the qp.Ensemble.objdata dictionary:

>>> ens_n.objdata["loc"] # gives the means
array([[0],
       [1]])
>>> ens_n.objdata["scale"] # gives the standard deviations
array([[0.5 ],
       [0.25]])

This is true of all the qp.stats distributions, though some will have different variables you can access. To find out what variables exist for a specific qp.stats distribution, take a look at the Parameterizations page.

Updating the data in an Ensemble

To update the data in an Ensemble without changing its metadata, you can use the qp.Ensemble.update_objdata() method. This will recreate the Ensemble with the existing metadata and the new data you’ve provided. If you’d like to preserve your old Ensemble and make a new Ensemble with this updated data, you should first save a copy of the old Ensemble to a different variable, like in the example below:

>>> import qp
>>> import numpy as np
>>> # create a histogram Ensemble
>>> ens_h = qp.hist.create_ensemble(bins= np.array([0,1,2,3,4,5]),
... pdfs = np.array([0,0.1,0.1,0.4,0.2]))
>>> ens_h.objdata # values before updating
{'pdfs': array([0.   , 0.125, 0.125, 0.5  , 0.25 ])}
>>> ens_h_old = ens_h # assign Ensemble to new variable to keep old version
>>> # update Ensemble with new data
>>> ens_h.update_objdata(data={'pdfs': np.array([0.05,0.09,0.2,0.3,0.15])})
>>> ens_h.objdata # values after updating
{'pdfs': array([[0.06329114, 0.11392405, 0.25316456, 0.37974684, 0.18987342]])}

If you’d like to change not only the data but also the metadata of the Ensemble, you can use the qp.Ensemble.update() method. Let’s say we want to get a new version of our histogram Ensemble where we have one less bin:

>>> ens_h.update(data={'bins': np.array([0,1,2,3,4]),'pdfs': np.array([0.05,0.09,0.2,0.3])})
>>> ens_h.objdata
{'pdfs': array([0.078125, 0.140625, 0.3125  , 0.46875 ])}
>>> ens_h.shape
(1, 4)

Normalizing an Ensemble

If you have an Ensemble and want to ensure it’s normalized, you can use the qp.Ensemble.norm() method. This method will only work for interpolation, irregular interpolation, and histogram Ensembles.

Let’s say you created an Ensemble without normalizing, but now you’ve changed your mind and want it normalized:

>>> import qp
>>> import numpy as np
>>> # create interpolated Ensemble
>>> xvals= np.array([0,0.5,1,1.5,2])
>>> yvals = np.array([[0.01, 0.2,0.3,0.2,0.01],[0.09,0.25,0.2,0.1,0.01]])
>>> ens_i = qp.interp.create_ensemble(xvals=xvals, yvals=yvals,norm=False)
>>> ens_i.objdata["yvals"] # values before normalizing
array([[0.01, 0.2 , 0.3 , 0.2 , 0.01],
       [0.09, 0.25, 0.2 , 0.1 , 0.01]])

>>> # normalize the Ensemble
>>> ens_i.norm()
>>> ens_i.objdata["yvals"] # values after normalizing
array([[0.02816901, 0.56338028, 0.84507042, 0.56338028, 0.02816901],
       [0.3       , 0.83333333, 0.66666667, 0.33333333, 0.03333333]])