3 # Overview # {#designexample}
6 This page walks through an iterative design process to give an
7 example of what sort of issues are important and what to think about
8 when choosing how to implement some functionality.
10 # Original Description # {#design_original}
12 Hao wants to implement ligand/protein scoring to IMP so that he can
13 take advantage of the existing infrastructure. The details of the scoring
14 function are currently experimental. The code does the following:
16 1. Read in the protein pdb and the small ligand mol2. The protein is in
18 file which defines its own set of pdb-compatible atom types.
19 2. He proposed storing the coordinates and atom types in vectors outside
20 of the decorators to speed up scoring.
21 3. Read in the potential of mean force (PMF) table from a file with
22 a custom format. The number of dimensions can be constant including
23 the two atom types for a pair atoms, and the distance between that
24 pair. The values are stored in the table will not change during the
25 program and need to be looked up quickly given the dimension data.
26 The PMF table uses different atom names than the mol2 file.
27 4. Score a conformation by looping over all ligand-protein atom
28 pairs. For each pair look up the PMF value in the table by the
29 two atom types and the distance, sum up all PMF values.
31 ## Comments on the original description ## {#design_original_comments}
33 1. mol2 is a standard file format so it makes sense to have a reader
34 for it in IMP. We can adopt the mol2 atom names as the standard names
35 for ligand atoms in IMP.
36 2. The details of how the coordinates are stored an accessed are
37 implementation details and worring about them too much should probably
38 be delayed until later once other considerations are figured out.
39 3. Loading the PMF table is a natural operation
for an initialization
40 function. However, since the PMF table is not a standard file format,
41 it doesn
't make sense for it to go into IMP, at least not until a file
42 form for the protein-ligand scoring has been worked out. Also there is
43 little reason to keep the PMF table atom types around, and they probably
44 should be convereted to more standard atom types on load. Finally, since
45 the data in the PMF file is directly the scoring data, there isn't a
46 real need to have a special representation
for it in memory.
47 4. There are two different considerations here, which pairs of atoms to
48 use and how to score each pair.
51 # Design Proposal
for Reading # {#design_reading}
52 Since the mol2 reader is quite separate from the scoring, we will consider
53 it on its own first. In analogy to the pdb reader, it makes sense to
54 provide a
function `
read_mol2(std::istream &in,
Model *m)` which returns
57 The mol2 atom types can either be added at runtime
using
59 similar to the IMP::atom::
AT_N. The latter requires editing both
60 IMP/atom/Atom.h and modules/atom/src/Atom.cpp and so it a bit harder
63 # Implementing Scoring as a IMP::Restraint # {#design_restraint}
65 First,
this functionality should probably go in a
new module since it
66 is experimental. One can use the scratch module in a separate `git` branch,
69 One could then have a `PMFRestraint` which loads a PMF file from the
70 module data directory (or from a user-specified path). It would
72 one
for the protein and score all pairs over the two. For each pair of atoms,
73 it would look at the IMP::atom::Atom::get_type() value and use that
74 to find the function to use in a stored table.
76 Such a design requires a reasonable amount of implementation, especially
77 once one is interested in accelerating the scoring by only scoring nearby
78 pairs. The `PMFRestraint` could use a IMP::core::ClosePairsScoreState
81 # Implementing Scoring as a IMP::PairScore # {#design_score}
83 One could instead separate the scoring from the pair generation by implementing
85 IMP::core::ClosePairsScoreState when experimenting to see what is the fastest
86 way to implement things.
89 IMP::atom::Atom::get_type() value to look up the correct function to use.
91 If you look around in \imp for similar pair scores (see IMP::
PairScore and the
92 inheritance diagram) you see there is a IMP::core::TypedPairScore which
93 already does what you need. That is, it takes a pair of particles, looks up
94 their types, and then applies a particular IMP::
PairScore based on their types.
95 IMP::core::TypedPairScore expects an IMP::
IntKey to describe the type. The
96 appropriate key can be obtained from IMP::atom::Atom::get_type_key().
98 Then all that needs to be implemented in a a function, say
99 IMP::hao::create_pair_score_from_pmf() which creates an IMP::core::TypedPairScore,
100 loads a PMF file and then calls IMP::core::TypedPairScore::set_pair_score() for
101 each pair stored in the PMF file after translating PMF types to the
102 appropriate IMP::atom::AtomType.
104 This design has the advantage of very little code to write. As a result it
105 is easy to experiment (move to 3D tables or change the set of close pairs). Also
106 different, non-overlapping PDFs can be combined by just adding more terms to
107 the IMP::core::TypedPairScore.
109 The disadvantages are that the scoring passes through more layers of function
110 calls, making it hard to use optimizations such as storing all the coordinates
114 # Some final thoughts # {#design_final}
116 1. Figure out orthogonal degrees of freedom and
try to split
117 functionality into pieces that control each. Here it is the set
118 of pairs and how to score each of them. Doing
this makes it
119 easier to reuse code.
120 2. Don
't create two classes when only have one set of work. Here,
121 all you have is a mapping between a pair of types and a
122 distance and a score. Having both a PMFTable and PMFPairScore
123 locks you into that aspect of the interface without giving you
124 any real flexibility.
125 3. Implementing things in terms of many small classes makes the
126 design much more flexible. You can easily replace a piece
127 without touching anything else and since each part is simple,
128 replacing a particular piece doesn't take much work. The added
129 complexity can easily be hidden away
using helper functions in
130 your code (or,
if the action is very common, in IMP).
AtomType add_atom_type(std::string name, Element e)
Create a new AtomType.
Key< 1, true > IntKey
The type used to identify int attributes in the Particles.
IMP::kernel::PairScore PairScore
Hierarchy read_pdb(base::TextInput input, kernel::Model *model, PDBSelector *selector=get_default_pdb_selector(), bool select_first_model=true)
The standard decorator for manipulating molecular structures.
Hierarchy read_mol2(base::TextInput mol2_file, kernel::Model *model, Mol2Selector *mol2sel=nullptr)
Create a hierarchy from a Mol2 file.