|
IMP Tutorial
|
This tutorial demonstrates the EMSequenceFinder method for assigning amino acid residue sequence to backbone fragments traced in an input cryo-electron microscopy (cryo-EM) map.
EMSequenceFinder is a method for assigning amino acid residue sequence to backbone fragments traced in an input cryo-electron microscopy (cryo-EM) map. EMSequenceFinder relies on a Bayesian scoring function for ranking 20 standard amino acid residue types at a given backbone position, based on the fit to a density map, map resolution, and secondary structure propensity. The fit to a density is quantified by a convolutional neural network that was trained on 5.56 million amino acid residue densities extracted from cryo-EM maps at 3–10 Å resolution and corresponding atomic structure models deposited in the Electron Microscopy Data Bank (EMDB). For more information, see Mondal et al, 2025.
This tutorial can be followed in several ways:
doc/emseqfinder.ipynb.EMSequenceFinder is implemented as part of the Integrative Modeling Platform (IMP). It is usually used by running the emseqfinder command-line tool.
First, download the files for this tutorial by using the "Clone or download" link at the tutorial's GitHub page. Then, install all dependencies, namely:
mrcfile, scipy, scikit-learn, statsmodels, pandas, and tensorflow Python packagesOne way to get these dependencies is via conda-forge. In order for TensorFlow prediction to work correctly on GPUs with libdevice, you may have to run export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX
Input files for the protocol should be placed in the subdirectories pdb_files, cryoem_maps and fasta_files containing input files in .pdb, .map and .fasta format respectively. Name all three files for a given run with the same stem. For this tutorial we have provided pdb_files/EMD-8637.pdb, cryoem_maps/EMD-8637.map, and fasta_files/EMD-8637.fasta.
Run the protocol on all of the files using emseqfinder batch. This will take a few minutes to run.
The following output files will be generated:
*_ML_side_ML_prob.dat contains fragment-wise sequence scores.batch_matching_results.txt contains overall sequence matching accuracy per structure.More tutorials on using IMP are available at the IMP web site.