IMP logo
IMP Tutorial
Demonstration of the emseqfinder tool

Introduction

This tutorial demonstrates the EMSequenceFinder method for assigning amino acid residue sequence to backbone fragments traced in an input cryo-electron microscopy (cryo-EM) map.

EMSequenceFinder is a method for assigning amino acid residue sequence to backbone fragments traced in an input cryo-electron microscopy (cryo-EM) map. EMSequenceFinder relies on a Bayesian scoring function for ranking 20 standard amino acid residue types at a given backbone position, based on the fit to a density map, map resolution, and secondary structure propensity. The fit to a density is quantified by a convolutional neural network that was trained on 5.56 million amino acid residue densities extracted from cryo-EM maps at 3–10 Å resolution and corresponding atomic structure models deposited in the Electron Microscopy Data Bank (EMDB). For more information, see Mondal et al, 2025.

This tutorial can be followed in several ways:

Basic usage of emseqfinder

EMSequenceFinder is implemented as part of the Integrative Modeling Platform (IMP). It is usually used by running the emseqfinder command-line tool.

First, download the files for this tutorial by using the "Clone or download" link at the tutorial's GitHub page. Then, install all dependencies, namely:

  • IMP
  • The STRIDE command line tool for secondary structure prediction
  • The mrcfile, scipy, scikit-learn, statsmodels, pandas, and tensorflow Python packages

One way to get these dependencies is via conda-forge. In order for TensorFlow prediction to work correctly on GPUs with libdevice, you may have to run export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX

Input files for the protocol should be placed in the subdirectories pdb_files, cryoem_maps and fasta_files containing input files in .pdb, .map and .fasta format respectively. Name all three files for a given run with the same stem. For this tutorial we have provided pdb_files/EMD-8637.pdb, cryoem_maps/EMD-8637.map, and fasta_files/EMD-8637.fasta.

Run the protocol on all of the files using emseqfinder batch. This will take a few minutes to run.

emseqfinder batch

The following output files will be generated:

  • *_ML_side_ML_prob.dat contains fragment-wise sequence scores.
  • batch_matching_results.txt contains overall sequence matching accuracy per structure.
cat EMD-8637_ML_side_ML_prob.dat
cat batch_matching_results.txt

Further reading

More tutorials on using IMP are available at the IMP web site.

CC BY-SA logo