IMP logo
PMI RNAPII Modeling Tutorial
Stage 3 - Sampling

With the system representation built and data restraints entered, the system is now ready to sample configurations. First, sampling parameters are set:

#--------------------------
# Set MC Sampling Parameters
#--------------------------
num_frames = 20000
num_mc_steps = 10

MC sampling parameters define the number of frames (model structures) which will be output during sampling. num_mc_steps defines the number of Monte Carlo steps between output frames. This setup would therefore encompass 200000 MC steps in total.

Next, a replica exchange run can be set up using the ReplicaExchange macro:

root_hier=root_hier,
monte_carlo_sample_objects=dof.get_movers(),
output_objects=outputobjects,
monte_carlo_temperature=1.0,
simulated_annealing=True,
simulated_annealing_minimum_temperature=1.0,
simulated_annealing_maximum_temperature=2.5,
simulated_annealing_minimum_temperature_nframes=200,
simulated_annealing_maximum_temperature_nframes=20,
replica_exchange_minimum_temperature=1.0,
replica_exchange_maximum_temperature=2.5,
number_of_best_scoring_models=100,
monte_carlo_steps=num_mc_steps,
number_of_frames=num_frames,
global_output_directory="output")

See the ReplicaExchange documentation for a full description of all of the input parameters.

The sampling is performed by executing the macro built above:

mc1.execute_macro()

Sampling Output

The script generates an output directory containing the following:

  • pdbs: a directory containing the 100 best-scoring models (see the number_of_best_scoring_models variable above) from the run, in PDB format.
  • rmfs: a single RMF file containing all the frames. RMF is a file format specially designed to store coarse-grained, multi-resolution and multi-state models such as those generated by IMP. It is a compact binary format and (as in this case) can also be used to store multiple models or trajectories.
  • Statistics from the sampling, contained in a "statfile", stat.*.out. This file contains information on each restraint, MC acceptance criteria and other things at each step.

Gathering Data from statfile

Data from the stat file can be parsed and analyzed using two utilities:

  • process_output.py - parses the statfile and returns columns of interest
  • plot_stat.sh - plots one or two columns of data (requires gnuplot)

process_output.py usage:

IMP_HOME/modules/pmi/pyext/process_output.py [-h] [-f FILENAME] [-s FIELDS [FIELDS ...]]
[-t SINGLE_COLUMN_FIELD] [-p] [--head]
[-n PRINT_RAW_NUMBER] [--soft]
[--search_field SEARCH_FIELD]
[--search_value SEARCH_VALUE] [--nframe]

plot_stat.sh usage:

IMP_HOME/modules/pmi/pyext/plot_stat.sh -i STATFILE -y YCOLUMN [-x XCOLUMN] [-m POINTS] [-plot] [-o OUTPUTFILE] [-b BEGIN]
#
# -i | input stat file name
# -y | column number with Y data values OR column header string
# -x | column number with X data values OR column header string
# -m | method of plotting. POINTS, LINES or LINESPOINTS
# -s | suppress showing plot
# -o | saves plot to png file with column header names
# -b | begin at this frame number
# -g | saves gnuplot file
# -h | prints this help text to screen

Analysis of the sampled models is described in Stage 4 - Analysis Part 1.

CC BY-SA logo