IMP Reference Guide
develop.50fdd7fa33,2025/09/03
The Integrative Modeling Platform
|
Assignment of sequence to backbone fragments traced in a cryo-EM map. More...
Assignment of sequence to backbone fragments traced in a cryo-EM map.
EMSequenceFinder is a method for assigning amino acid residue sequence to backbone fragments traced in an input cryo-electron microscopy (cryo-EM) map. EMSequenceFinder relies on a Bayesian scoring function for ranking 20 standard amino acid residue types at a given backbone position, based on the fit to a density map, map resolution, and secondary structure propensity. The fit to a density is quantified by a convolutional neural network that was trained on 5.56 million amino acid residue densities extracted from cryo-EM maps at 3–10 Å resolution and corresponding atomic structure models deposited in the Electron Microscopy Data Bank (EMDB). For more information, see Mondal et al, 2025.
In addition to IMP's own dependencies, EMSequenceFinder also requires
mrcfile
, scipy
, scikit-learn
, statsmodels
, pandas
, and tensorflow
Python packagesOne way to get these dependencies is via conda-forge. In order for TensorFlow prediction to work correctly on GPUs with libdevice
, you may have to run export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX
The protocol is typically run using the emseqfinder
command line tool in the following fashion:
pdb_files
, cryoem_maps
and fasta_files
containing input files in .pdb
, .map
and .fasta
format respectively. Name all three files for a given run with the same stem (e.g. EMD-8637.pdb
, EMD-8637.map
, EMD-8637.fasta
).emseqfinder batch
.Upon successful execution, the following output files will be generated:
*_ML_side_ML_prob.dat
contains fragment-wise sequence scores.batch_matching_results.txt
contains overall sequence matching accuracy per structure.Author(s): Dibyendu Mondal, Vipul Kumar
Maintainer: benmwebb
License: LGPL This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
Publications:
Namespaces | |
calculate_seq_match_batch | |
Determine percentage sequence overlap for multiple results files. | |
compute_dynamic_threshold | |
Determine a threshold for an EM map. | |
Functions | |
def | get_data_path |
Return the full path to one of this module's data files. More... | |
def | get_example_path |
Return the full path to one of this module's example files. More... | |
def | get_module_name |
Return the fully-qualified name of this module. More... | |
def | get_module_version |
Return the version of this module, as a string. More... | |
def IMP.emseqfinder.get_data_path | ( | fname | ) |
Return the full path to one of this module's data files.
Definition at line 17 of file emseqfinder/__init__.py.
def IMP.emseqfinder.get_example_path | ( | fname | ) |
Return the full path to one of this module's example files.
Definition at line 22 of file emseqfinder/__init__.py.
def IMP.emseqfinder.get_module_name | ( | ) |
Return the fully-qualified name of this module.
Definition at line 12 of file emseqfinder/__init__.py.
def IMP.emseqfinder.get_module_version | ( | ) |
Return the version of this module, as a string.
Definition at line 7 of file emseqfinder/__init__.py.