Contributing to qp

Making a new distribution type

Here is a checklist of things that you will need to include in a class that implements a new type of distritubion.

  1. What type of distribution are you making?

  1. A “simple” distribution, i.e., a distribution that is defined by a fixed set of parameters. In that case you should implement your class as a sub-class of scipy.stats.rv_continuous and then use the qp.factory._make_scipy_wrapped_class class to extend it to qp pdf class.

  2. A “row-based” distribution i.e., distributions that are configured by providing a variable sized set of paramters, where each row corresponds to one PDF, such as a histogram or grid. In that case you should inherit from the qp.Pdf_rows_gen class.

  1. In the static block before the first class method you should define the name the class will go by, the version of the class (used to convert older version when reading them back from disk), and the mask across which the distribution is supported.

name = 'hist'
version = 0

_support_mask = rv_continuous._support_mask
  1. In the constuctor of the class you should store whatever information you will need to evaluate the PDF and make sure that it is consistent. Here is an example from the histogram implementation. Note how the check_input keyword is used to allow you to skip the normalization step of the PDF, if you know that they are normalized.

self._hbins = np.asarray(bins)
self._nbins = self._hbins.size - 1
self._hbin_widths = self._hbins[1:] - self._hbins[:-1]
if np.shape(pdfs)[-1] != self._nbins: # pragma: no cover
    raise ValueError("Number of bins (%i) != number of values (%i)" % (self._nbins, np.shape(pdfs)[-1]))
check_input = kwargs.pop('check_input', True)
if check_input:
    sums = np.sum(pdfs*self._hbin_widths, axis=1)
    self._hpdfs = (pdfs.T / sums).T
else: #pragma: no cover
    self._hpdfs = pdfs
  1. In the constructor of the class you should extract the number of PDF and pass them to the base class constructor.

kwargs['npdf'] = pdfs.shape[0]
super(hist_rows_gen, self).__init__(*args, **kwargs)
  1. In the constructor you should define which data members of the class are “data” and “metadata”. In this context, “data” means quantites that are defined for each PDF, and “metadata” means quantities that are shared between all the PDFs. This should be the minimal set of data need to reconstruct the class instance.

self._addmetadata('bins', self._hbins)
self._addobjdata('pdfs', self._hpdfs)
  1. You should provide properties to access each of the “data” and “metadata” fields.

@property
def bins(self):
    """Return the histogram bin edges"""
    return self._hbins

@property
def pdfs(self):
    """Return the histogram bin values"""
    return self._hpdfs
  1. At a minimum you need to implement either the _pdf _cdf scipy hook functions to evaluate the PDF. Optionally you can implement the _sf, _ppf, _isf, _rvs functions as well, for faster evaluate. See below for some comments on how to make these evaluation functions fast.

def _pdf(self, x, row):
    # pylint: disable=arguments-differ
    return evaluate_unfactored_hist_x_multi_y(x, row, self._hbins, self._hpdfs)

def _cdf(self, x, row):
    # pylint: disable=arguments-differ
    if self._hcdfs is None: #pragma: no cover
        self._compute_cdfs()
    if np.shape(x)[:-1] == np.shape(row)[:-1]:
        return interpolate_unfactored_x_multi_y(x, row, self._hbins, self._hcdfs, bounds_error=False, fill_value=(0.,1.))
    return interp1d(self._hbins, self._hcdfs[np.squeeze(row)], bounds_error=False, fill_value=(0.,1.))(x)  # pragma: no cover
  1. You should implement the _updated_ctor_param function that scipy needs in order to copy distributions. This should make a dictionary of all the constructor parameters.

def _updated_ctor_param(self):
    """
    Set the bins as additional constructor argument
    """
    dct = super(hist_rows_gen, self)._updated_ctor_param()
    dct['bins'] = self._hbins
    dct['pdfs'] = self._hpdfs
    return dct
  1. You should define functions to convert other ensembles to this representation. Doing that requires two things: 1) a function to extract values for the orignal representation, and 2) a function to to use those values to create a new ensemble. Finally, you have to add those mappings to the dictionaries that the class carries with it. conversions happen. None is used as a wildcard to catch any values that are not explicitly defined.

@classmethod
def add_mappings(cls, conv_dict):
    """
    Add this classes mappings to the conversion dictionary
    """
    cls._add_creation_method(cls.create, None)
    cls._add_extraction_method(convert_using_hist_values, None)
    cls._add_extraction_method(convert_using_hist_samples, "samples")
  1. If you want, you can define a particular method for plotting distributions of the class that better capture the representation of the PDF by adding a plot_native method to the class.

@classmethod
def plot_native(cls, pdf, **kwargs):
    """Plot the PDF in a way that is particular to this type of distibution

    For a histogram this shows the bin edges
    """
    axes, _, kw = get_axes_and_xlims(**kwargs)
    vals = pdf.dist.pdfs[pdf.kwds['row']]
    return plot_pdf_histogram_on_axes(axes, hist=(pdf.dist.bins, vals), **kw)
  1. After the class definiton, you need to register the class with the factory, and make the creation function available.

hist = hist_gen.create
add_class(hist_gen)
  1. After the class definition, you can also add test data to the class so that it will be tested in the automatically generated tests. The test data takes the form of a multi-level dictionary. At the top level each key-value pair will be used for four tests:

    1. Creating a distribution and making sure that the pdf functions are well-behaved.

    2. Writing the distribution to disk and reading it back and making sure it is the same,

    3. Converting a normal distribution to a distribution of this type and making sure it is reasonably close to the original.

    4. Testing the plotting functions.

@classmethod
def make_test_data(cls):
    """ Make data for unit tests """
    hist_gen.test_data = dict(hist=dict(gen_func=hist, ctor_data=dict(bins=XBINS, pdfs=HIST_DATA),\
                                        convert_data=dict(bins=XBINS), test_xvals=TEST_XVALS),
                              hist_samples=dict(gen_func=hist, ctor_data=dict(bins=XBINS, pdfs=HIST_DATA),\
                                                convert_data=dict(bins=XBINS, method='samples',\
                                                                              size=NSAMPLES),\
                                                atol_diff=1e-1, atol_diff2=1e-1,\
                                                test_xvals=TEST_XVALS, do_samples=True))

Checks for new code

There are a number of checks that will need to pass before a pull request adding new code will be accepted. These should all be implemented in the travis automated testing, but it can also be useful to run them yourself before you make the pull request.

Running pylint

There is a .pylintrc file defining the style that we want. You can run any changes against that by doing:

pylint qp

Please correct any and all messages. It a very few cases you can disable specific warnings in specific functions, for example by adding

# pylint: disable=arguments-differ

To the function in question.

Adding unit tests for your class

If you have implemented the make_test_data classmethod, then up to four sets unit tests will be automatcially generated for your class. These are built by the PDFTestCase.auto_add_class function in qp/tests/test_auto.py. The actual functions are in qp/test_funcs.py; they are:

  1. pdf functionality tests, which runs a set of consistency checks to make sure that the pdf is well defined and to test that the relationships between pdf(), cdf(), sf(), ppf(), etc.. are consistent.

  2. persistence tests, which runs a loopback test that write the class to disk in various formats and reads it back and verifies that the result is identical to the original.

  3. conversion tests, which verifies that converting to the class works by comparing the pdf() values computed on a grid from an input ensemble in a different representation to values in your classes representation.

  4. plotting tets, which verifies that the plotting function doesn’t crash. Making sure the output is sensible is up to you.

Running unit tests

You can use the do_cover.sh script to run the unit test and check their coverage. We will require 100% coverage, but it is ok to use #pragma: no cover statements to skip error blocks.

./do_cover.sh

#### Running demo notebooks

There are some demo notebooks in qp you can verify that they all work by rendering them to html.

./render_nb.sh