To understand the function of a macromolecular assembly, we must know the structure of its components and the interactions between them. However, direct experimental determination of such a structure is generally rather difficult. While multiple methods do exist for structure determination, each has a drawback. For example, crystals suitable for X-ray crystallography cannot always be produced, especially for large assemblies of multiple components. Cryo-electron microscopy (cryo-EM), on the other hand, can be used to study large assemblies, but it is generally limited to worse than atomic resolution. Finally, proteomics techniques, such as yeast two-hybrid and mass spectrometry, yield information about the interactions between proteins, but not the positions of these proteins within the assembly or the structures of the proteins themselves.

One approach to solve the structures of proteins and their assemblies is by "integrative" or "hybrid" modeling, in which information from different methods is considered simultaneously during the modeling procedure. The approach is briefly outlined here for clarity; it has been covered in greater detail previously. These methods can include:

Experimental techniques, such as:
- X-ray crystallography
- nuclear magnetic resonance (NMR) spectroscopy (CSP, NOE, J-couplings)
- electron microscopy (EM) (2D class averages or 3D maps)
- footprinting
- Immunoprecipitation pull-down
- Cysteine cross-linking
- Chemical cross-linking
- FRET spectroscopy
- small angle X-ray scattering (SAXS)
- proteomics
Theoretical sources of information, such as:
- template structures used in comparative/homology modeling
- scoring functions used in molecular docking
- statistical preferences
- physics-based energy functions

The integrative approach has several advantages:

Synergy among the input data minimizes the drawbacks of sparse, noisy, ambiguous and incoherent datasets. Each individual piece of data contains little structural information, but by simultaneously fitting a model to all data derived from independent experiments, the degeneracy of the structures that fit the data can be markedly reduced.
This approach has the potential to produce all structures that are consistent with the data, not only one structure.
An analysis of the structures allows us to estimate the precision of both the data and the structures.
This approach can make the process of structure determination more efficient, by indicating which measurements would be most informative.

Example modeling efforts

Hybrid structures based on our integrative approach:

The E. coli ribosome, the first eukaryotic ribosome from S. cerevisiae
The first mammalian ribosome from C. lupus48 and a fungal ribosome
The E. coli Hsp90
The eukaryotic chaperonin TRiC/CCT
The actin/scruin complex
Ryr1 voltage gated channel
The baker’s yeast nuclear pore complex (NPC)
The Nup84 complex
Transport through the NPC
Microtubule nucleation
The 26S proteasome
PCS9K-Fab complex
The yeast spindle pole body
Chromatin globin domain
The lymphoblastoid cell genome

Integrative modeling procedure

The integrative modeling procedure used here is shown below.

The first step in the procedure is to collect all experimental, statistical, and physical information that describes the system of interest.

A suitable representation for the system is then chosen and the available information is translated to a set of spatial restraints on the components of the system. For example, in the case of characterizing the molecular architecture of the nuclear pore complex (NPC), atomic structures of the protein subunits were not available, but the approximate size and shape of each protein was known, so each protein was represented as a ‘string’ of connected spheres consistent with the protein size and shape. A simple distance between two proteins can be restrained by a harmonic function of the distance, while the fit of a model into a 3D cryo-EM density map can be restrained by the cross-correlation between the map and the computed density of the model.

Next, the spatial restraints are summed into a single scoring function that can be sampled using a variety of optimizers, such as conjugate gradients, molecular dynamics, Monte Carlo, and inference-based methods. This sampling generates an ensemble of models that are as consistent with the input information as possible.

In the final step, the ensemble is analyzed to determine, for example, whether all of the restraints have been satisfied or certain subsets of data conflict with others. The analysis may generate a consensus model, such as the probability density for the location of each subunit in the assembly, and yield a measure of the uncertainty in the solutions.