IMP logo
IMP Tutorial
Introduction to writing IMP code

Introduction

In this tutorial we will cover creating a new IMP module, and writing a new restraint in C++.

For more information, see the section on developing IMP in the manual.

First, we need to build IMP from source code. See the installation instructions in the manual for more details.

Add a new module

First, change into the top-level directory in the IMP source code. You will see a modules directory that contains a subdirectory for each IMP module. To add a new module called 'foo', use the tools/make-module.py script as follows (the $ character here denotes the command prompt):

$ tools/make-module.py foo

This will make a new subdirectory modules/foo; let's take a look at its contents:

$ ls modules/foo
README.md bin examples pyext test
benchmark dependencies.py include src utility

include directory (C++ headers)

The include directory in the new module contains C++ header files that declare the public classes and other functions that are part of the module. For classes that are not intended to be public (e.g. utility classes only used by your module itself) put them instead in the include/internal subdirectory.

Let’s add a new class to our module, MyRestraint, a simple restraint that restrains a particle to the xy plane (see the ExampleRestraint class in modules/example/ for a similar class).

the convention in IMP is for class names (and the files declaring and defining them) to be CamelCase. See the naming conventions section in the manual for more details.

We do this by creating a file MyRestraint.h in the modules/foo/include/ subdirectory. We'll look at each section of this file in turn (the whole file is also available at GitHub). The first part of the file looks like

#ifndef IMPFOO_MY_RESTRAINT_H
#define IMPFOO_MY_RESTRAINT_H
#include <IMP/foo/foo_config.h>
#include <IMP/Restraint.h>
IMPFOO_BEGIN_NAMESPACE

The ifndef/define is a header guard, which prevents the file from being included multiple times. The convention in IMP for the header guard name is to use upper case IMP<module name>_<file name>.

All of our classes will exist in the IMP::foo namespace. The IMPFOO_BEGIN_NAMESPACE macro ensures this. It is defined in the foo_config.h header file.

We are going to declare a restraint, so the compiler needs the declaration of the IMP::Restraint base class, which is in IMP/Restraint.h.

The next part of the header declares our new class:

class IMPFOOEXPORT MyRestraint : public Restraint {
double k_;
public:
MyRestraint(Model *m, ParticleIndex p, double k);
void do_add_score_and_derivatives(ScoreAccumulator sa) const override;
ModelObjectsTemp do_get_inputs() const override;
IMP_OBJECT_METHODS(MyRestraint);
};

IMPFOOEXPORT should be used for any class that has a .cpp implementation, and ensures the class can be used outside of the module (e.g. in Python).

The IMP_OBJECT_METHODS macro adds standard methods that all IMP objects (like IMP::Restraint) are expected to provide.

Our constructor takes an IMP::Model, a particle in that model, and a force constant. We also declare the necessary methods to return the score and inputs for the restraint - we will define these later in the .cpp file.

The final part of the file looks like:

IMPFOO_END_NAMESPACE
#endif /* IMPFOO_MY_RESTRAINT_H */

This just closes the namespace and header guard from the start of the file.

src directory (C++ code)

Next, we need to provide a definition for the class. We do this by making a corresponding file MyRestraint.cpp in the
modules/foo/src/ subdirectory. The first part of this file looks like:

#include <IMP/foo/MyRestraint.h>
#include <IMP/core/XYZ.h>
IMPFOO_BEGIN_NAMESPACE

Similarly to the header file, we need to put everything in the IMP::foo namespace and include any needed header files. Here we include the previous declaration of the MyRestraint class. We also need the declaration of the XYZ decorator from the IMP::core module since we are going to be using the particle’s coordinates to calculate the score.

Next, we define the constructor of the class:

MyRestraint::MyRestraint(Model *m, ParticleIndex p, double k)
: Restraint(m, "MyRestraint%1%"), p_(p), k_(k) {}

The constructor simply calls the IMP::Restraint base class constructor (which takes the IMP::Model and a human-readable name) and stores the p and k arguments in the class attributes p_ and k_ (IMP convention is for class attributes to end in an underscore). %1% is replaced with a unique number, so multiple restraints will be named MyRestraint1, MyRestraint2, etc.

Next, we implement the restraint's score and first derivatives:

void MyRestraint::do_add_score_and_derivatives(ScoreAccumulator sa) const {
core::XYZ d(get_model(), p_);
double score = .5 * k_ * square(d.get_z());
if (sa.get_derivative_accumulator()) {
double deriv = k_ * d.get_z();
d.add_to_derivative(2, deriv, *sa.get_derivative_accumulator());
}
sa.add_score(score);
}

We apply a simple harmonic restraint to the z coordinate to keep the particle in the xy plane; we use the IMP::core::XYZ decorator to treat the particle as a coordinate.

The IMP::ScoreAccumulator class is given the score, and analytic first derivatives as well if requested.

We also need to tell IMP which particles our restraint acts on by overriding the do_get_inputs method:

ModelObjectsTemp MyRestraint::do_get_inputs() const {
return ModelObjectsTemp(1, get_model()->get_particle(p_));
}

Here we just have a single particle, p_.

This is used to order the evaluation of restraints and constraints (a constraint which moves particle A must be evaluated before any restraint with A as an input) and for parallelization. See the IMP manual for more details.

Finally, the file ends with:

IMPFOO_END_NAMESPACE

As before, we need to close the namespace. (For reference, the whole file is also available at GitHub).

pyext directory (Python interface)

Next, we make the C++ class available in Python. In IMP we use a tool called SWIG to do this. We need to configure the SWIG interface by modifying the
swig.i-in file in the modules/foo/pyext/ subdirectory. First, we need to tell SWIG how to wrap the MyRestraint class by adding this line to the file:


IMP_SWIG_OBJECT(IMP::foo, MyRestraint, MyRestraints);

This tells SWIG that MyRestraint is an IMP Object. Most IMP classes are subclasses of IMP::Object. These are heavyweight objects which are always passed by reference-counted pointers, and are generally not copied. Some simple classes (e.g. IMP::algebra::Vector3D) are subclasses of IMP::Value instead. These are lightweight objects which are generally passed by value or reference, and can be trivially copied. See the IMP manual for more details.

Next, we tell SWIG to parse our C++ header file for the class by adding the line:

%include "IMP/foo/MyRestraint.h"

With the SWIG interface complete, we will be able to use our class from Python as IMP.foo.MyRestraint.

(For reference, the whole file is also available at GitHub).

You can also add arbitrary Python code to your module. This is added to the swig.i-in file using the SWIG pythoncode directive. See the PMI module for an example.

You can also add entire Python submodules by adding Python files to the pyext/src subdirectory. For example the file pyext/src/my_python.py can be imported in Python using import IMP.foo.my_python. This is also used in the PMI module.

Documentation

Documentation of our custom class is omitted here for clarity, but all C++ headers and .cpp files should contain comments! All comments are parsed by the doxygen tool, which uses the special comment markers //! and
/** */. See the IMP manual for more details.

You should also fill in
modules/foo/README.md with a description of the module and the license it is released under. We recommend an open source license such as the LGPL.

test directory (test cases)

Next we should write a test case in the modules/foo/test/ directory, by creating a new file test_restraint.py. Test cases periodically verify that IMP is working correctly. They can be written in C++, but are almost always written in Python for flexibility.

IMP convention is to name a test file starting with test_.

The first part of our test file looks like

from __future__ import print_function, division
import IMP
import IMP.test
import IMP.core
import IMP.foo
class Tests(IMP.test.TestCase):

This imports the IMP kernel, any other IMP modules used in the test, and our own IMP.foo module. The imports from __future__ help to ensure that our test works in the same way in both Python 2 and Python 3.

All tests should be classes that use the IMP.test module, which adds some IMP-specific functionality to the standard Python unittest module.

Next, we add a test method to our class:

def test_my_restraint(self):
m = IMP.Model()
p = m.add_particle("p")
d = IMP.core.XYZ.setup_particle(m, p, IMP.algebra.Vector3D(1,2,3))
r = IMP.foo.MyRestraint(m, p, 10.)
self.assertAlmostEqual(r.evaluate(True), 45.0, delta=1e-4)
self.assertLess(IMP.algebra.get_distance(d.get_derivatives(),
1e-4)
self.assertEqual(len(r.get_inputs()), 1)

This creates a restraint object, requests its score and derivatives (evaluate), and asks for inputs (get_inputs). Here we simply test by comparing to known good values using the standard unittest methods assertAlmostEqual, assertLess, and assertEqual. (The IMP.test.TestCase class provides some additional methods helpful for IMP tests.)

Always use assertAlmostEqual for floating point comparisons, never assertEqual (two floating point numbers which look identical to a human may not be represented identically by the computer).

Finally, we end the test script with

if __name__ == '__main__':

This simply runs all the tests in this file if the script is run directly from the command line with python3 test_restraint.py.

(For reference, the whole file is also available at GitHub).

bin directory (command line tools)

IMP modules can include command line tools. We don't include any such tools in this module, but to add a tool, add a C++ or Python file to the bin directory. The tool will be compiled if needed and then installed with the rest of IMP in the binary directory (e.g. /usr/local/bin). See the FoXS module for an example C++ command line tool and the em module for example Python tools.

Usually command line tools are all installed in the same directory, so take care to give each program a fairly unique name so as not to conflict with other IMP programs or the operating system itself.

Dependencies

Finally we need to tell the IMP build system which other modules and external code the module depends on. This is done by editing the file modules/foo/dependencies.py to read:

required_modules = 'core:algebra'
required_dependencies = ''
optional_dependencies = ''

Since we use the IMP::core and IMP::algebra modules, we need to declare them as requirements for this module.

required_dependencies and optional_dependencies can also be used to make use of 3rd party libraries. See the IMP manual for more information.

(For reference, the whole file is also available at GitHub).

Pure Python modules

If there is no C++ code in your module at all - i.e. it is pure Python - then you can speed up building of your module by marking it as Python only. This is done by adding python_only = True to the dependencies.py file. SWIG is not used in Python-only modules; instead, put any Python code you want in the top-level module in the pyext/src/__init__.py file. For an example, see the IMP.test module. See the IMP manual for more information.

Source control

Now is a good time to store the module in source control so that it can be easily shared with collaborators and users, and changes to it can be tracked. This will also simplify the process of incorporating the module into the main IMP distribution later, if applicable.

Most IMP modules are stored on GitHub. See https://github.com/salilab/pmi/ and https://github.com/salilab/npctransport for examples.

Build and test

To build the custom module, build IMP from source code in the usual way. cmake should detect the new module and configure it, and then your build tool (usually make or ninja) will build it.

Test the new code with something like (in the IMP build directory):

$ ./setup_environment.sh python3 ../imp/modules/foo/test/test_restraint.py

You can also run all of your module's test cases using the ctest tool; see the IMP manual for more details.

Automatic testing

You can automate the building and testing of your module. This is very helpful because any bugs introduced during development may quickly be detected. Provided your module is in a public GitHub repository and is open source, you can use a number of free cloud services to do this:

  • GitHub Actions will build and test your module in a virtual machine every time you git push to GitHub. If the module fails to build, or a test fails, GitHub will send you an email.
  • Codecov will track the code coverage, i.e. which lines of your Python or C++ code were executed when the tests were run. This is a helpful tool for showing where extra tests are needed (to exercise the missing lines of code, which might contain bugs).

If you are using a public GitHub repository in the Sali Lab organization please speak to a Sali Lab sysadmin to set up automatic testing of your module. Otherwise, you will need to sign up for a Codecov account, and add your repository to it. Then create a suitable build.yml in the .github/workflows/ directory of your repository and a tools/setup_ci.sh script. See the build.yml and setup_ci.sh from the IMP.pmi repository for templates. These two files instruct GitHub Actions to:

  1. Set up an environment on a virtual machine in the cloud in which to build your module (they install Anaconda Python, the latest nightly build of IMP, and support packages such as a C++ compiler).
  2. Build your module using cmake and make.
  3. Run the test cases using the pytest Python testing system.
  4. Collect coverage information for both C++ and Python code using special cmake and pytest command line options respectively, and upload it to Codecov.

The entire procedure is duplicated for multiple Python versions (both Python 2 and Python 3).

For example, this tutorial is itself tested in this fashion. See the latest GitHub Actions results and the the latest code coverage reports.

RMF support

We can extend on our basic class by adding support for RMF (for ease of comparison, we'll do this in a new class MyRestraint2 that is a copy of MyRestraint.)

When IMP writes restraints to RMF files, only basic information is included - namely

  • static information (that does not change during the course of a simulation): the name of the restraint, and the particles it acts on; and
  • dynamic information (which generally changes from frame to frame): the total score of the restraint.

We can add extra static or dynamic information to the RMF file, by overriding the get_static_info or get_dynamic_info methods, respectively. Each returns an IMP::RestraintInfo object which is a simple list of key:value pairs. Here we'll add the force constant (which is static information) to the RMF file by declaring the method in the C++ header file, include/MyRestraint2.h:

RestraintInfo *get_static_info() const override;

Next, we provide an implementation of the method in the cpp file, src/MyRestraint2.cpp:

RestraintInfo *MyRestraint2::get_static_info() const {
IMP_NEW(RestraintInfo, ri, ());
ri->add_string("type", "IMP.foo.MyRestraint2");
ri->add_float("force constant", k_);
return ri.release();
}

The convention in IMP is that if static restraint information is provided, it should include a "type" string which gives the full name of the restraint. All key names should be lower case words separated by spaces, e.g. "force constant" rather than "ForceConstant".

Of course, the new method can and should be tested - see the test_static_info method in the test file at GitHub, test/test_restraint2.py.

Serialization support

We can further extend our MyRestraint2 class by adding support for serialization. When an IMP object is serialized its internal state - such as the values of its member variables - is written to or read in from a file, string or stream. This allows for individual objects or an entire IMP run to be saved and later restored, or to be sent from one machine to another. IMP uses the cereal library to implement serialization in C++. In Python, the objects can be loaded or saved using the pickle module.

To add basic serialization support to our class, we first must add the cereal headers to our C++ header file, include/MyRestraint2.h:

#include <cereal/access.hpp>

Then we add a default constructor and a new private method serialize to the same C++ header file, which the cereal library will use to both read and write the class state:

MyRestraint2() {}
private:
friend class cereal::access;
template<class Archive> void serialize(Archive &ar) {
ar(cereal::base_class<Restraint>(this), p_, k_);
}

The default constructor (constructor which takes no arguments) is used to create an empty MyRestraint2 object when deserializing - first the empty object is constructed, and then the class state is filled in from the serialization data.

The class state comprises the state of the base Restraint class (which is handled by cereal::base_class), plus the particle our restraint acts on (p_) and the force constant (k_). (The friend declaration used here allows the cereal library to call our serialize method, which normally it would not be able to do since the method is marked private.)

The IMP_OBJECT_SERIALIZE_DECL macro is used to handle polymorphic classes, which includes most IMP restraints. It needs to be paired with a similar macro in the cpp file, src/MyRestraint2.cpp, which uses the fully qualified name of the class:

IMP_OBJECT_SERIALIZE_IMPL(IMP::foo::MyRestraint2);

To add support for Python pickle, we replace the IMP_SWIG_OBJECT macro in the SWIG interface file, pyext/swig.i-in, with IMP_SWIG_OBJECT_SERIALIZE:

IMP_SWIG_OBJECT_SERIALIZE(IMP::foo, MyRestraint2, MyRestraint2s);

Serialization support should also be tested - see the test_serialize and test_serialize_polymorphic methods in the test file at GitHub.

CC BY-SA logo