[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IMP-dev] biological names for things



Since I keep bringing up that it is good to be able to refer to parts of the representation using biological names, I thought I would throw together some code to do that. The goal is to allow one to set up the restraints without ever explicitly referring to particles, just asking for parts of the hierarchy. The advantage of this is that
- one can change the representation pretty much arbitrarily without messing with the restraints code
- the restraints code is much easier to read
- you don't have to worry about whether the representation happens to contain a Fragment referring to the piece of the molecule you are interested in (and hence avoid the problem of overlapping, but not nested, fragments).

For the implementation, there is a class IMP.atom.Named which describes a part of one or more molecules. You create a name with doing various things like
nm=IMP.atom.Named(hierarchy=h, molecule="my molecule", residue=10)
(assuming there is more than one molecule in h) and then can get the one or more particles referenced by it with
nmps=nm.get_particles()
eg, if h is an all atom mode, nm.get_particles() will return the atoms in the residue, if it is a coarse grained model, it will return the particle containing the residue.

The names can be used to create a distance restraint with
IMP.atom.create_harmonic_distance_restraint(IMP.atom.Named(hierarchy=all, molecule="proteina", residue=10),
                                                                                     IMP.atom.Named(hierarchy=all, molecule="proteinb", residues=(1,10),
                                                                                      10, 1)
which will return a restrain limiting the distance between the 10th residue of proteina and the first 10 residues of proteinb to 10A with a spring constant of 1. The restraint is a SphereDistanceRestraint if each of the two Named refer to just one particle and a ConnectivityRestraint using a ClosePairsPairScore if they refer to more than one (the ClosePairsPairScore internally takes advantage of rigid bodies if the hierarchies are rigid bodies). I haven't wrapped any other restraint types yet. With EM, for example, one might as well just pass nm0.get_particles()+nm1.get_particles() to restrain two molecules with a map.

It is just something I threw together on the train, so I wouldn't trust the code much yet (and the name "Named" isn't great, but I want it to be short).

The Named objects can be created with any combination of
- one or more names of molecules
- a single residue index or list of residue indexes (or ranges)-- later we can add tags for the C- and N- terminus
- one or more atom types
- one or more residue types
- one or more chain identifiers
- one or more domain names
- a target radius (so that you can refer to intermediate levels of the hierarchy)
eg
Named(hierarchy=h, chains=['a', 'b'], target_radius=10) which gets as close to 10A representation of the two chains as one can.
Name(hierarchy=h, atom_types=[CA]) which gets all C alphas

This way you can write functions to set up restraints without any direct reference to particles.

Thoughts? Does it seem useful? I quite like using such code when setting things up.