IMP Reference Guide
develop.031dafb4d2,2024/05/21
The Integrative Modeling Platform
|
Python module to perform Nested Sampling-based optimization of representation for integrative structural modeling
Shreyas Arvindekar, Aditi S Pathak, Kartik Majila, Shruthi Viswanath, Optimizing representations for integrative structural modeling using Bayesian model selection, Bioinformatics, Volume 40, Issue 3, March 2024, btae106, at DOI. Data is deposited in Zenodo
IMP (compiled from the source code). See IMP installation See requirements.txt
for Python dependencies
imp/modules/nestor/
with it(See also examples/
)
python pyext/src/xl_datasplitter.py {path}
where, path refers to the path of the target crosslinking file.examples/modeling.py
. One will also need to make separate topology files for different candidate representations. _* Make sure that the restraints that are to be used to inform the likelihood have weight=0
, and these are added to a separate list that is passed to the replica exchange macro as nestor_restraints
argument_. _* Ensure the modeling script looks similar to the one in example/
. Specifically, ensure that the modeling instructions are enclosed in a function that is called so that the terminal stdout of the modeling is not returned to the terminal. One can use contextlib
as shown in the example._nestor_params.yaml
file.Run the NestOR wrapper as follows:** python pyext/src/wrapper_v6.py -p {nestor_param_path}
where, nestor_param_path
refers to the absolute path to the nestor_params.yaml
file. If using topology file for representing the system, use -t
flag. This flag can be ommitted if the representation is defined in the modeling script. If only the plotting functionalities of NestOR are to be used, run the above command with -s
flag.
Note** One NestOR run
corresponds to the set of all nested sampling runs for all candidate representations._ One can also compare results from NestOR runs
with different parameter settings by running python pyext/src/compare_runs_v2_w_pyplot.py {comparison_title} run_set1 run_set2 ...
where comparison_title is the title for the runs to be compared, run_set1 and run_set2 are the NestOR runs to be compared.
Step 1 in the Run command above, i.e. one NestOR run generates these plots:
*_params_evidence_errorbarplot.png
) shows the mean values of evidence for all the candidate representations along with errorbars showing the standard error on the mean.*_params_persteptime.png
) shows the time required to sample one MCMC step per run. This is computed as (time taken for iteration 0)/((number of initial frames)*(number of MCMC steps per frame))
*sterr_evi_and_proctime.png
) compares evidences and their sampling efficiency across representations.This file is generated upon completion of step 1 in the Run command
above.
Evidence related:**
float
The estimated evidence value represented as natural logarithm of the estimated evidencefloat
Information obtained from the nested sampling runanalytical_uncertainty: float
The analytical uncertainty associated with evidence estimation for a run by nested sampling
Efficiency related**
float
Time taken per MCMC step. This is computed as (time taken for iteration 0)/((number of initial frames)*(number of MCMC steps per frame))
nestor_process_time: float
Wall clock time taken by a nested sampling run to finish, represented in seconds
Termination related**
int
(0, 11, 12, 13) Exit code for a nested sampling runstr
Cause for run terminationint
Number of times Replica Exchange failed to obtain a sample from constrained prior in the current iteration of nested samplingint
Iteration count (number of iterations) when nested sampling terminatedplateau_hits: int
Number of consecutive times the nested sampling protocol detected a plateau in the estimated evidence
Exit codes:**
Exit code 13: Run terminated due to Math domain error in analytical uncertainty calculation. This happened probably because the run terminated too early resulting in a negative value for H.
If a run terminates with exit code = 12
, the run is considered incomplete (and is not rerun) and its results are not considered valid, i.e. these are not plotted and not used to infer optimal representation. Results from runs with exit codes 0 and 13 are used to infer the optimal representation**
Author(s):** Shreyas Arvindekar, Shruthi Viswanath Date**: April 7th, 2023 License:** CC BY-SA 4.0 This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. Last known good IMP version:** not tested
Testable:** Yes Parallelizeable:** Yes Publications:** Arvindekar, S., Viswanath, S. Optimizing representations for integrative structural modeling using bayesian model selection. DOI: 10.1093/bioinformatics/btae106.