IMP Tutorial
|
In this tutorial we will cover creating a new IMP module, and writing a new restraint in C++.
For more information, see the section on developing IMP in the manual.
First, we need to build IMP from source code. See the installation instructions in the manual for more details.
First, change into the top-level directory in the IMP source code. You will see a modules
directory that contains a subdirectory for each IMP module. To add a new module called 'foo
', use the tools/make-module.py
script as follows (the $
character here denotes the command prompt):
This will make a new subdirectory modules/foo
; let's take a look at its contents:
The include
directory in the new module contains C++ header files that declare the public classes and other functions that are part of the module. For classes that are not intended to be public (e.g. utility classes only used by your module itself) put them instead in the include/internal
subdirectory.
Let’s add a new class to our module, MyRestraint
, a simple restraint that restrains a particle to the xy plane (see the ExampleRestraint class in modules/example/
for a similar class).
the convention in IMP is for class names (and the files declaring and defining them) to be CamelCase. See the naming conventions section in the manual for more details.
We do this by creating a file MyRestraint.h
in the modules/foo/include/
subdirectory. We'll look at each section of this file in turn (the whole file is also available at GitHub). The first part of the file looks like
The ifndef
/define
is a header guard, which prevents the file from being included multiple times. The convention in IMP for the header guard name is to use upper case IMP<module name>_<file name>
.
All of our classes will exist in the IMP::foo
namespace. The IMPFOO_BEGIN_NAMESPACE
macro ensures this. It is defined in the foo_config.h
header file.
We are going to declare a restraint, so the compiler needs the declaration of the IMP::Restraint base class, which is in IMP/Restraint.h.
The next part of the header declares our new class:
IMPFOOEXPORT
should be used for any class that has a .cpp
implementation, and ensures the class can be used outside of the module (e.g. in Python).
The IMP_OBJECT_METHODS
macro adds standard methods that all IMP objects (like IMP::Restraint) are expected to provide.
Our constructor takes an IMP::Model, a particle in that model, and a force constant. We also declare the necessary methods to return the score and inputs for the restraint - we will define these later in the .cpp
file.
The final part of the file looks like:
This just closes the namespace and header guard from the start of the file.
Next, we need to provide a definition for the class. We do this by making a corresponding file MyRestraint.cpp
in the
modules/foo/src/
subdirectory. The first part of this file looks like:
Similarly to the header file, we need to put everything in the IMP::foo
namespace and include any needed header files. Here we include the previous declaration of the MyRestraint
class. We also need the declaration of the XYZ decorator from the IMP::core module since we are going to be using the particle’s coordinates to calculate the score.
Next, we define the constructor of the class:
The constructor simply calls the IMP::Restraint base class constructor (which takes the IMP::Model and a human-readable name) and stores the p
and k
arguments in the class attributes p_
and k_
(IMP convention is for class attributes to end in an underscore). %1%
is replaced with a unique number, so multiple restraints will be named MyRestraint1, MyRestraint2, etc.
Next, we implement the restraint's score and first derivatives:
We apply a simple harmonic restraint to the z coordinate to keep the particle in the xy plane; we use the IMP::core::XYZ decorator to treat the particle as a coordinate.
The IMP::ScoreAccumulator class is given the score, and analytic first derivatives as well if requested.
We also need to tell IMP which particles our restraint acts on by overriding the do_get_inputs
method:
Here we just have a single particle, p_
.
This is used to order the evaluation of restraints and constraints (a constraint which moves particle A must be evaluated before any restraint with A as an input) and for parallelization. See the IMP manual for more details.
Finally, the file ends with:
As before, we need to close the namespace. (For reference, the whole file is also available at GitHub).
Next, we make the C++ class available in Python. In IMP we use a tool called SWIG to do this. We need to configure the SWIG interface by modifying the
swig.i-in
file in the modules/foo/pyext/
subdirectory. First, we need to tell SWIG how to wrap the MyRestraint
class by adding this line to the file:
This tells SWIG that MyRestraint
is an IMP Object. Most IMP classes are subclasses of IMP::Object. These are heavyweight objects which are always passed by reference-counted pointers, and are generally not copied. Some simple classes (e.g. IMP::algebra::Vector3D) are subclasses of IMP::Value instead. These are lightweight objects which are generally passed by value or reference, and can be trivially copied. See the IMP manual for more details.
Next, we tell SWIG to parse our C++ header file for the class by adding the line:
With the SWIG interface complete, we will be able to use our class from Python as IMP.foo.MyRestraint
.
(For reference, the whole file is also available at GitHub).
You can also add arbitrary Python code to your module. This is added to the
swig.i-in
file using the SWIGpythoncode
directive. See the PMI module for an example.You can also add entire Python submodules by adding Python files to the
pyext/src
subdirectory. For example the filepyext/src/my_python.py
can be imported in Python usingimport IMP.foo.my_python
. This is also used in the PMI module.
Documentation of our custom class is omitted here for clarity, but all C++ headers and .cpp
files should contain comments! All comments are parsed by the doxygen tool, which uses the special comment markers //!
and
/** */
. See the IMP manual for more details.
You should also fill in
modules/foo/README.md
with a description of the module and the license it is released under. We recommend an open source license such as the LGPL.
Next we should write a test case in the modules/foo/test/
directory, by creating a new file test_restraint.py
. Test cases periodically verify that IMP is working correctly. They can be written in C++, but are almost always written in Python for flexibility.
IMP convention is to name a test file starting with test_
.
The first part of our test file looks like
This imports the IMP kernel, any other IMP modules used in the test, and our own IMP.foo
module. The imports from __future__
help to ensure that our test works in the same way in both Python 2 and Python 3.
All tests should be classes that use the IMP.test module, which adds some IMP-specific functionality to the standard Python unittest module.
Next, we add a test method to our class:
This creates a restraint object, requests its score and derivatives (evaluate
), and asks for inputs (get_inputs
). Here we simply test by comparing to known good values using the standard unittest methods assertAlmostEqual
, assertLess
, and assertEqual
. (The IMP.test.TestCase class provides some additional methods helpful for IMP tests.)
Always use
assertAlmostEqual
for floating point comparisons, neverassertEqual
(two floating point numbers which look identical to a human may not be represented identically by the computer).
Finally, we end the test script with
This simply runs all the tests in this file if the script is run directly from the command line with python3 test_restraint.py
.
(For reference, the whole file is also available at GitHub).
IMP modules can include command line tools. We don't include any such tools in this module, but to add a tool, add a C++ or Python file to the bin
directory. The tool will be compiled if needed and then installed with the rest of IMP in the binary directory (e.g. /usr/local/bin
). See the FoXS module for an example C++ command line tool and the em module for example Python tools.
Usually command line tools are all installed in the same directory, so take care to give each program a fairly unique name so as not to conflict with other IMP programs or the operating system itself.
Finally we need to tell the IMP build system which other modules and external code the module depends on. This is done by editing the file modules/foo/dependencies.py
to read:
Since we use the IMP::core and IMP::algebra modules, we need to declare them as requirements for this module.
required_dependencies
and optional_dependencies
can also be used to make use of 3rd party libraries. See the IMP manual for more information.
(For reference, the whole file is also available at GitHub).
If there is no C++ code in your module at all - i.e. it is pure Python - then you can speed up building of your module by marking it as Python only. This is done by adding python_only = True
to the dependencies.py
file. SWIG is not used in Python-only modules; instead, put any Python code you want in the top-level module in the pyext/src/__init__.py
file. For an example, see the IMP.test module. See the IMP manual for more information.
Now is a good time to store the module in source control so that it can be easily shared with collaborators and users, and changes to it can be tracked. This will also simplify the process of incorporating the module into the main IMP distribution later, if applicable.
Most IMP modules are stored on GitHub. See https://github.com/salilab/pmi/ and https://github.com/salilab/npctransport for examples.
To build the custom module, build IMP from source code in the usual way. cmake
should detect the new module and configure it, and then your build tool (usually make
or ninja
) will build it.
Test the new code with something like (in the IMP build directory):
You can also run all of your module's test cases using the ctest
tool; see the IMP manual for more details.
You can automate the building and testing of your module. This is very helpful because any bugs introduced during development may quickly be detected. Provided your module is in a public GitHub repository and is open source, you can use a number of free cloud services to do this:
git push
to GitHub. If the module fails to build, or a test fails, GitHub will send you an email.If you are using a public GitHub repository in the Sali Lab organization please speak to a Sali Lab sysadmin to set up automatic testing of your module. Otherwise, you will need to sign up for a Codecov account, and add your repository to it. Then create a suitable build.yml
in the .github/workflows/
directory of your repository and a tools/setup_ci.sh
script. See the build.yml and setup_ci.sh from the IMP.pmi repository for templates. These two files instruct GitHub Actions to:
cmake
and make
.cmake
and pytest
command line options respectively, and upload it to Codecov.The entire procedure is duplicated for multiple Python versions (both Python 2 and Python 3).
For example, this tutorial is itself tested in this fashion. See the latest GitHub Actions results and the the latest code coverage reports.
We can extend on our basic class by adding support for RMF (for ease of comparison, we'll do this in a new class MyRestraint2
that is a copy of MyRestraint
.)
When IMP writes restraints to RMF files, only basic information is included - namely
We can add extra static or dynamic information to the RMF file, by overriding the get_static_info or get_dynamic_info methods, respectively. Each returns an IMP::RestraintInfo object which is a simple list of key:value pairs. Here we'll add the force constant (which is static information) to the RMF file by declaring the method in the C++ header file, include/MyRestraint2.h
:
Next, we provide an implementation of the method in the cpp file, src/MyRestraint2.cpp
:
The convention in IMP is that if static restraint information is provided, it should include a "type" string which gives the full name of the restraint. All key names should be lower case words separated by spaces, e.g. "force constant" rather than "ForceConstant".
Of course, the new method can and should be tested - see the test_static_info
method in the test file at GitHub, test/test_restraint2.py
.
We can further extend our MyRestraint2
class by adding support for serialization. When an IMP object is serialized its internal state - such as the values of its member variables - is written to or read in from a file, string or stream. This allows for individual objects or an entire IMP run to be saved and later restored, or to be sent from one machine to another. IMP uses the cereal library to implement serialization in C++. In Python, the objects can be loaded or saved using the pickle module.
To add basic serialization support to our class, we first must add the cereal headers to our C++ header file, include/MyRestraint2.h
:
Then we add a default constructor and a new private method serialize
to the same C++ header file, which the cereal library will use to both read and write the class state:
The default constructor (constructor which takes no arguments) is used to create an empty MyRestraint2
object when deserializing - first the empty object is constructed, and then the class state is filled in from the serialization data.
The class state comprises the state of the base Restraint
class (which is handled by cereal::base_class
), plus the particle our restraint acts on (p_
) and the force constant (k_
). (The friend
declaration used here allows the cereal library to call our serialize
method, which normally it would not be able to do since the method is marked private
.)
The IMP_OBJECT_SERIALIZE_DECL
macro is used to handle polymorphic classes, which includes most IMP restraints. It needs to be paired with a similar macro in the cpp file, src/MyRestraint2.cpp
, which uses the fully qualified name of the class:
To add support for Python pickle, we replace the IMP_SWIG_OBJECT
macro in the SWIG interface file, pyext/swig.i-in
, with IMP_SWIG_OBJECT_SERIALIZE
:
Serialization support should also be tested - see the test_serialize
and test_serialize_polymorphic
methods in the test file at GitHub.