[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Helper functions
Rather than answer Ben's questions one by one, I'll try to address
things in bulk.
The route we chose to go down for representing particles in IMP is
very non-structured. You just give it a string and get back a value.
This makes things very flexible (for example I can trivially
implement the hierarchy on top of it) but makes it hard to maintain
invariants. The best way to do this is to provide helper functions
and beat users into using them. For example an add_child helper
function adds a child to a node and makes sure the parent_index is
correct. A compute_coordinates_from_center_of_mass_of_children
function does exactly that. Unfortunately, there is no way of making
sure that it gets called every time the the set of children change
(although add_child/remove_child could of course be made more clever
and we can use a State object to make sure it gets called after
coordinates are updated by the optimizer).
Another issues is that all lookups involve searching for a string in
a table. This can be expensive. The cost of generating the string
should be trivial as they can easily be cached (I do so in my
get_child helper function).
The alternative would have been to use an object hierarchy and have
the objects manage everything internally. Then we can have all sorts
of types of objects which allow you to get and set attributes
directly (hiding the Model_data object and the indirection provided
by the IntIndex sort of things from users of the various Particle
classes). Then we would have a GeometricParticle which has methods x
() and y() which return floats for the coordinates and a
HierarchyParticle() which has child(i) etc. The main disadvantage is
that you have to cast all over the place (but now that C++ has RTTI
this isn't too bad). The other disadvantage is that loading data from
files is more tricky as the mapping between the text string in the
file and the attribute no longer happens for free (you have to know
"X" corresponds to the function set_x()). We can provide macros to
make this mapping easier though.
Personally I think the class based approach is better, but Brett
liked databases and went with the former. The one thing I think we
should not do is mix the two. Either everything is an object and you
get things through C++ calls or everything is as it is currently and
you manipulate things through helper functions. If we mix, it is hard
to keep track of what everything is and make sure that things like
saving and restoring state happen properly as well as just being ugly.
On Nov 2, 2007, at 4:54 PM, Ben Webb wrote:
Daniel Russel wrote:
- Is Residue just an example of a member of a hierarchy, or would
chains and proteins be treated differently?
A tree node is a tree node. It can happen to also have some
biological function, but that is orthogonal to being a hierarchy
node.
I think you misunderstood my question.
Quite likely :-)
The wiki page has a description of what attributes a Residue has,
but nothing about chains or proteins, so I was just trying to
ascertain whether you just put in Residue as an example (and just
haven't done chains/proteins yet) or whether you think they should
be treated specially. I think your answer means the former, yes?
Yes, the former. I just haven't had any reason to add more fields to
chains or proteins other than what they have from being in the
hierarchy and being a generic object (i.e. they have a name, a type
and children and parents).
Well, sure, but let's say I have a rigid body containing 500 atoms.
It has 7 attributes - the xyz of its center of mass, and an
orientation quaternion. These would both have to be updated if
particles were added to or removed from the rigid body. By making
these 'dumb' attributes, the only way to do that is to do the
update every time you want to use the rigid body, which seems
inefficient to me. In contrast, a ParticleContainer object could
have a method to add/remove particles, so that it could do the
update when necessary.
To not answer your question, for updates to locations caused by the
optimizer, a State object would handle things quite nicely.
I see your point that we need somewhere to put the functionality to
call it when you add or remove a point. Personally I would prefer a
free floating function that you call passing a particle in the
hierarchy (like my hierarchy helper functions for getting the ith
child). Then you could easily provide your own function if you want
to do something slightly different or could apply the "compute center
of mass of all children" function to a body which didn't happen to be
rigid.
- If I wanted to pull out every atom in residue 1, I'd really
have to scan through every single particle to figure out which
ones a) have a residue attribute and b) have it = 1 ? That seems
inefficient.
You would find the particle for residue 1 and get "child_0",
"child_1"...
I don't think you should ever have to scan through all particles
(and, personally, I don't think you should be able to as it would
encourage bad habits).
Ah, I see - it wasn't clear to me from the wiki page. Then my
concerns here are 1) you have the information in two locations, so
you will need to do consistency checks to make sure that the child/
parent pointers all point to the right thing;
Yes, that is true.
2) that seems grossly inefficient - imagine a container with 10000
atoms, doing the string concatenation and formatting to get child_0
through child_9999, then the hashtable lookup, as opposed to just
iterating through a std::vector<int>.
Well, you wouldn't actually do the string concatenation since that
can be trivially cached (in fact I currently do it in my helper
function). You would have to do the table lookup though. This is a
general problem with our architecture which may prove to be a problem
in the long run. Even if we special case the children in the
hierachy, you still have the same problem when you want do to
anything other than look at the children/parents of a hierarchy node
(such as the coordinates).