.. _contributing: Contributing to qp ================== Making a new distribution type ------------------------------ Here is a checklist of things that you will need to include in a class that implements a new type of distritubion. 1. What type of distribution are you making? 1. A "simple" distribution, i.e., a distribution that is defined by a fixed set of parameters. In that case you should implement your class as a sub-class of `scipy.stats.rv_continuous` and then use the `qp.factory._make_scipy_wrapped_class` class to extend it to qp pdf class. 2. A "row-based" distribution i.e., distributions that are configured by providing a variable sized set of paramters, where each row corresponds to one PDF, such as a histogram or grid. In that case you should inherit from the `qp.Pdf_rows_gen` class. 2. In the static block before the first class method you should define the name the class will go by, the version of the class (used to convert older version when reading them back from disk), and the mask across which the distribution is supported. .. code-block:: python name = 'hist' version = 0 _support_mask = rv_continuous._support_mask 3. In the constuctor of the class you should store whatever information you will need to evaluate the PDF and make sure that it is consistent. Here is an example from the histogram implementation. Note how the `check_input` keyword is used to allow you to skip the normalization step of the PDF, if you know that they are normalized. .. code-block:: python self._hbins = np.asarray(bins) self._nbins = self._hbins.size - 1 self._hbin_widths = self._hbins[1:] - self._hbins[:-1] if np.shape(pdfs)[-1] != self._nbins: # pragma: no cover raise ValueError("Number of bins (%i) != number of values (%i)" % (self._nbins, np.shape(pdfs)[-1])) check_input = kwargs.pop('check_input', True) if check_input: sums = np.sum(pdfs*self._hbin_widths, axis=1) self._hpdfs = (pdfs.T / sums).T else: #pragma: no cover self._hpdfs = pdfs 4. In the constructor of the class you should extract the number of PDF and pass them to the base class constructor. .. code-block:: python kwargs['npdf'] = pdfs.shape[0] super(hist_rows_gen, self).__init__(*args, **kwargs) 5. In the constructor you should define which data members of the class are "data" and "metadata". In this context, "data" means quantites that are defined for each PDF, and "metadata" means quantities that are shared between all the PDFs. This should be the minimal set of data need to reconstruct the class instance. .. code-block:: python self._addmetadata('bins', self._hbins) self._addobjdata('pdfs', self._hpdfs) 6. You should provide properties to access each of the "data" and "metadata" fields. .. code-block:: python @property def bins(self): """Return the histogram bin edges""" return self._hbins @property def pdfs(self): """Return the histogram bin values""" return self._hpdfs 7. At a minimum you need to implement either the `_pdf` `_cdf` scipy hook functions to evaluate the PDF. Optionally you can implement the `_sf`, `_ppf`, `_isf`, `_rvs` functions as well, for faster evaluate. See below for some comments on how to make these evaluation functions fast. .. code-block:: python def _pdf(self, x, row): # pylint: disable=arguments-differ return evaluate_unfactored_hist_x_multi_y(x, row, self._hbins, self._hpdfs) def _cdf(self, x, row): # pylint: disable=arguments-differ if self._hcdfs is None: #pragma: no cover self._compute_cdfs() if np.shape(x)[:-1] == np.shape(row)[:-1]: return interpolate_unfactored_x_multi_y(x, row, self._hbins, self._hcdfs, bounds_error=False, fill_value=(0.,1.)) return interp1d(self._hbins, self._hcdfs[np.squeeze(row)], bounds_error=False, fill_value=(0.,1.))(x) # pragma: no cover 8. You should implement the `_updated_ctor_param` function that scipy needs in order to copy distributions. This should make a dictionary of all the constructor parameters. .. code-block:: python def _updated_ctor_param(self): """ Set the bins as additional constructor argument """ dct = super(hist_rows_gen, self)._updated_ctor_param() dct['bins'] = self._hbins dct['pdfs'] = self._hpdfs return dct 9. You should define functions to convert other ensembles to this representation. Doing that requires two things: 1) a function to extract values for the orignal representation, and 2) a function to to use those values to create a new ensemble. Finally, you have to add those mappings to the dictionaries that the class carries with it. conversions happen. `None` is used as a wildcard to catch any values that are not explicitly defined. .. code-block:: python @classmethod def add_mappings(cls, conv_dict): """ Add this classes mappings to the conversion dictionary """ cls._add_creation_method(cls.create, None) cls._add_extraction_method(convert_using_hist_values, None) cls._add_extraction_method(convert_using_hist_samples, "samples") 10. If you want, you can define a particular method for plotting distributions of the class that better capture the representation of the PDF by adding a `plot_native` method to the class. .. code-block:: python @classmethod def plot_native(cls, pdf, **kwargs): """Plot the PDF in a way that is particular to this type of distibution For a histogram this shows the bin edges """ axes, _, kw = get_axes_and_xlims(**kwargs) vals = pdf.dist.pdfs[pdf.kwds['row']] return plot_pdf_histogram_on_axes(axes, hist=(pdf.dist.bins, vals), **kw) 11. After the class definiton, you need to register the class with the factory, and make the creation function available. .. code-block:: python hist = hist_gen.create add_class(hist_gen) 12. After the class definition, you can also add test data to the class so that it will be tested in the automatically generated tests. The test data takes the form of a multi-level dictionary. At the top level each key-value pair will be used for four tests: 1. Creating a distribution and making sure that the pdf functions are well-behaved. 2. Writing the distribution to disk and reading it back and making sure it is the same, 3. Converting a normal distribution to a distribution of this type and making sure it is reasonably close to the original. 4. Testing the plotting functions. .. code-block:: python @classmethod def make_test_data(cls): """ Make data for unit tests """ hist_gen.test_data = dict(hist=dict(gen_func=hist, ctor_data=dict(bins=XBINS, pdfs=HIST_DATA),\ convert_data=dict(bins=XBINS), test_xvals=TEST_XVALS), hist_samples=dict(gen_func=hist, ctor_data=dict(bins=XBINS, pdfs=HIST_DATA),\ convert_data=dict(bins=XBINS, method='samples',\ size=NSAMPLES),\ atol_diff=1e-1, atol_diff2=1e-1,\ test_xvals=TEST_XVALS, do_samples=True)) Checks for new code ------------------- There are a number of checks that will need to pass before a pull request adding new code will be accepted. These should all be implemented in the travis automated testing, but it can also be useful to run them yourself before you make the pull request. Running pylint -------------- There is a .pylintrc file defining the style that we want. You can run any changes against that by doing: .. code-block:: bash pylint qp Please correct any and all messages. It a very few cases you can disable specific warnings in specific functions, for example by adding .. code-block:: python # pylint: disable=arguments-differ To the function in question. Adding unit tests for your class -------------------------------- If you have implemented the `make_test_data` classmethod, then up to four sets unit tests will be automatcially generated for your class. These are built by the `PDFTestCase.auto_add_class` function in `qp/tests/test_auto.py`. The actual functions are in `qp/test_funcs.py`; they are: 1. pdf functionality tests, which runs a set of consistency checks to make sure that the pdf is well defined and to test that the relationships between `pdf()`, `cdf()`, `sf()`, `ppf()`, etc.. are consistent. 2. persistence tests, which runs a loopback test that write the class to disk in various formats and reads it back and verifies that the result is identical to the original. 3. conversion tests, which verifies that converting to the class works by comparing the `pdf()` values computed on a grid from an input ensemble in a different representation to values in your classes representation. 4. plotting tets, which verifies that the plotting function doesn't crash. Making sure the output is sensible is up to you. Running unit tests ------------------ You can use the `do_cover.sh` script to run the unit test and check their coverage. We will require 100\% coverage, but it is ok to use `#pragma: no cover` statements to skip error blocks. .. code-block:: python ./do_cover.sh #### Running demo notebooks There are some demo notebooks in `qp` you can verify that they all work by rendering them to html. .. code-block:: bash ./render_nb.sh