Re: [IMP-users] Trying to use MultiFit but I have problems with installing/running IMP

Note that MultiFit is currently available in binary form only, and only
supported in combination with the binary downloads of IMP. So it may not
work on non-RedHat/non-Mac systems.

I just installed and run MultiFit on MAC OSX.

I then went through the MultiFit tutorial provided at : http://www.integrativemodeling.org/1.0/tutorial/multifit.html, and here am I with (the usual) heap of comments and question. Hope it can help improve the thing and help me understand things in the process.

A.] Concerning input and output files

a.) As a general comment, I think a summary table with all input (output files) and a short description would help the comprehension

b.) assembly.jt file is not documented.

I understand it is an input file that contains the junction tree as described in the paper "Lasker K, Topf M, Sali A, Wolfson HJ. Inferential optimization for simultaneous fitting of multiple components into a CryoEM map of their assembly. J. Mol. Biol. 2009;388(1):180-194."

I guess this file has to be hand-forged by the user after the regions definitions in the 1tyq_20.fine.gmm.pdb file produced at step 2.

The indices in the node description refer to the ordering in the above mentioned file, and the indices in the edge list refer to the index of a line in the nodes list.

c.) Concerning the final output (out of run_multifit)

ARP3,0|ARP2,14|ARC1,3|ARC2,24|ARC3,19|ARC4,11|ARC5,13|(17.5593729019)(rmsd:29.2637996674)(conf.0.pdb)

If I am correct, subunits are provided in a pipe-separated list, sorted according to the region index to which they have been attributed in that particular configuration (here, ARP3 is in region 0, ARP2 in region 2, etc…)

The integer appearing after the subunit name refer to the index of the solution for this subunit as it was fitted alone in the (finer simplified representation of the) EM map, in step 3.

B.] Small "issues" in the process

a) I don't know if it has something to do with my personal installation, but the first scoring script complained he could not write the files and I had to manually mkdir the "scoring" subdirectory.

b) all conf-XXX.pdb files are dumped in the root directory which is quite messy, maybe it would be better if these would be dumped in the results subdir

C.] Questions

a) Concerning the definition of regions in the map :

If I understand well, regions are defined at step 2, and output as fake Ca atoms in the coarse model of 1tyq_20.fine.gmm.pdb file. It is the user responsibility to infer the regions connectivity out of this coarse representation and the input map; then to create the junction tree. Am I correct ?

b) Interactomics data : I have the feeling no interaction data are considered in the process (I mean informations such as "ARC2 in known to interact with ARP2"). Is that true ?

c) I don't understand what run_multifit() exactly does. More precisely :

1. it appears a preliminary filter is performed on configurations. By configuration I mean the attribution of each subunit to one and only one region (I think it corresponds to a mapping, in the code). If this is the case, I don't think it is performed based on interactomics data, so what is it based upon ?

2. I have the feeling one and only one solution is output (the best score) per retained configuration. Am I right ?

d) Am I correct to say the cross correlation computations are only used in scoring (step 3), and not in pre fitting of subunits (step 2) ? Hence, if I am correct, the fft based cross correlation approach has been replaced by the neural/gmm approach for that particular step ?

Thanks for any answers you can provide, as well as for the incredible job you are doing.

--Ben