Re: [IMP-dev] on documentation

On Mon, Aug 6, 2012 at 11:03 AM, Daniel Russel <" target="_blank">drussel@gmail.com> wrote:

To add my two cents:
- the search in doxygen kind of sucks and google definitely isn't better on this. There is no good way that I can think of to prioritize search results, so I'm not sure where to go to make this aspect better. And, unfortunately, as one adds more to the API and documentation, you just get more hits it more or less random order that you have to look through. Anyone have any good ideas on this? We can try going back to the doxygen live search as that may allow one to experiment more interactively with search terms (I had severe limitations before, but these may have been fixed).

- in general, you need to read the documentation of all the bases classes of a class and the module before you will understand the class. I think this cannot be reasonably avoided. Otherwise content would have to be duplicated in many places, which invariably results in it having more errors/being even less compete (or requiring a great deal more time for the same amount of content). Hopefully something likeÂConfigurationSetRMSDMetricÂwould make more sense in light understanding statistics::Metric. For example, it has no get_rmsd() method since it is a specialization of the Metric base class and that defines a get_distance() virtual method, so having a get_rmsd() method would be useless where it is supposed to be used.

-Â What I would really like to see is that when someone spends the time to figure something out like this, they add an example/patch the comments in the files and then sends the patch off to someone to integrate :-)

- I'd like to move to a more structured commit model for IMP with some more review of things that go in so that we can prod people (and me) more to improve docs/merge redundant things. I typed up some thoughts on modifying the comment model here <https://github.com/salilab/imp/wiki/A-proposed-commit-model-for-IMP> Feel free to edit (or request permissions to edit, I'm a bit unclear on how those are regulated :-) The main idea would be that if things, in general, have two people look at them before going into most modules in IMP, they should be a bit more coherent and documented. And, if one is able to share things prior to committing them to the SVN repository, they can stay in purgatory a bit longer (and will hopefully be worked on a bit longer), before they considered good enough and work on them ceases (as tends to happen). Not sure if this will work :-)

On Mon, Aug 6, 2012 at 9:56 AM, Daniel Russel <" target="_blank">drussel@gmail.com> wrote:

---------- Forwarded message ----------
From: Riccardo Pellarin <" target="_blank">pellarin.riccardo@gmail.com>
Date: Mon, Aug 6, 2012 at 12:05 AM
Subject: Re: on documentation
To: Daniel Russel <" target="_blank">drussel@gmail.com>

Hi Guys,

would like to share my thoughts on IMP documentation, maybe repeating
what we've already said. I think it is important, though, to share our experience.

Let's suppose I want to fit two structures and calculate the Calpha-rmsd,
a very simple task.

Was typing RMSD in the imp manual search field and got 36 entries.

1st problem: the entry titles are uninformative, unless you know exactly
what each module is supposed to do (statistic, atom, multifit etc, etc).Â

Knowing a little bit of IMP I could filter the entries and remove all classes belonging
to multifit and em modules, for instance. Let's take the first seven entries which
might do what I want to do:

Member IMP::statistics::ConfigurationSetRMSDMetric::ConfigurationSetRMSDMetric
class IMP::atom::RMSDCalculator
Member IMP::atom::RMSDCalculator::RMSDCalculator
Member IMP::atom::RMSDCalculator::RMSDCalculators
class IMP::statistics::ConfigurationSetRMSDMetric
Member IMP::atom::get_pairwise_rmsd_score
IMP::atom::get_rmsd

2nd problem: I see a lot of redundancy in the list, and a lot of confusion:
classes and members are mixed together... why is that? Wouldn't it be cleaner
to separate them in two different lists?

Now, let's clean a little bit the list, my eyes go on these candidates:

class IMP::atom::RMSDCalculator
class IMP::statistics::ConfigurationSetRMSDMetric
Member IMP::atom::get_pairwise_rmsd_score
IMP::atom::get_rmsd

3rd Problem: there is not a single function that does a simple task
as an RMSD calculation, but there are many, with different flavors...
Probably many people implemented the same thing many times
because they didn't understand what was implemented before?

Let's have a look at the functions and see if they do what I want...
(as a side note, reading the documentation of IMP functions,

I really would like to leave notes on many of them....)

Let's start with IMP::atom::RMSDCalculator

Detailed Description

Fast rmsd calculation. Used to calculate rmsd between multipleÂ
transformation that operate on the same particles

Well, that is not detailed.Â

What is a "fast rmsd"? No structural fitting I guess? What is the "rmsd between multiple

transformations" ? Maybe rigid body transformations? I start to doubt that thisÂrmsd function isÂ

calculated between particles at all...Â
Let's try to rewrite it. This is what I would like to read:

Short Description:ÂCalculates the rmsd of a list of particles.

Detailed Description:ÂCalculates the root mean square displacement (rmsd) of particles
subjected to rigid-body transformations. The rmsd calculation does

not perform structural best-fit alignment.
Usage:Â

1) construct the class using a list of particles:
RMSDCalculator(particles)

2) get the rmsd using the method get_rmsd(trans3D1,Âtrans3D2)Â

whereÂtrans3D1 andÂtrans3D2 are rigid body transformations of the

reference and displaced configurations, respectively.
Simple Example: ....

It would be cool if the short description appears in the search

page, along with the class name.

Let's go to the second function:ÂÂIMP::statistics::ConfigurationSetRMSDMetric

Detailed Description

Compute the RMSD between specified sets of particles in pairs of configurations, within a configuration set

this is even more cryptic. Maybe:Â

Calculates the RMSD of a list of particles between all possible configurations pairs in a "configuration set", which is....

Strangely, this class has not get_rmsd(), but get_distance() method....
Is that the same?

Let's go to another example:ÂIMP::atom::get_pairwise_rmsd_score

The measure quantifies the RMSD between the relative placements of two components compared to a reference relative placement. First, the two compared structures are brought into the same frame of reference by superposing the first pair of equivalent domains (ref1 and mdl1). Next, the RMSD is calculated for the second component

What are the components? Maybe subunits? What are the domains? Why the function is called rmsd_score? Is that different from the rmsd?

Ok I can go on for almost every function and method in IMP.

At the end, I'm completely unsure of what function I should use

for my task.... they all look the same.

Here's my proposal: Every function documentation must have these entries:

Short Description:Â(appears in the search page)
Detailed Description:Â
[Algorithm Description:Âin some cases]

Usage:Â
Simple Example:Â

The developer might leave these fields empty, of course.Â
When I search something, the first entries should be theÂ
ones which are more relevant and documented.
Or maybe, the search page should have Documented and Undocumented results.Â

(where Undocumented is a function which is lacking a long documentation page).

Of course we cannot force people to write comprehensive documentation,

but at least we can give the user the option of choosing the functions which
are better documented: that will be bad for developers that write code which is

undocumented, since their code will never be used by somebody else.
As a user, I will be skeptical using something where the documentation fieldsÂ

are empty!

Sorry, that was long. Hope to hear your feedbacks