IMP logo
IMP Reference Guide  develop.b79bf0c255,2019/12/05
The Integrative Modeling Platform
IMP::statistics Namespace Reference

Code to compute statistical measures. More...

Detailed Description

Code to compute statistical measures.

Data to be clustered is represented one of two ways, either with an IMP::statistics::Embedding or a IMP::statistics::Metric. The representation is then passed to an algorithm that returns a clustering object such as an IMP::statistics::PartitionalClustering.

Info

Author(s): Keren Lasker, Daniel Russel

Maintainer: benmwebb

License: LGPL This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Publications:

Classes

class  ChiSquareMetric
 Compute the distance between two configurations using chi2. More...
 
class  ConfigurationSetRMSDMetric
 
class  ConfigurationSetXYZEmbedding
 Embed a configuration using the XYZ coordinates of a set of particles. More...
 
class  Embedding
 Store data to be clustered for embedding based algorithms. More...
 
class  EuclideanMetric
 
class  HistogramD
 Dynamically build a histogram embedded in D-dimensional space. More...
 
class  Metric
 Store data to be clustered for distance metric based algorithms. More...
 
class  ParticleEmbedding
 
class  PartitionalClustering
 A base class for clustering results where each item is in one cluster. More...
 
class  PartitionalClusteringWithCenter
 
class  RecursivePartitionalClusteringEmbedding
 
class  RecursivePartitionalClusteringMetric
 Represent a metric for clustering data that has already been clustered once. More...
 
class  VectorDEmbedding
 Simply return the coordinates of a VectorD. More...
 

Typedefs

typedef IMP::Vector
< IMP::Pointer< Embedding > > 
Embeddings
 
typedef IMP::Vector
< IMP::WeakPointer< Embedding > > 
EmbeddingsTemp
 
typedef IMP::Vector
< IMP::Pointer< Metric > > 
Metrics
 
typedef IMP::Vector
< IMP::WeakPointer< Metric > > 
MetricsTemp
 

Functions

PartitionalClusteringWithCentercreate_bin_based_clustering (Embedding *embed, double side)
 
PartitionalClusteringcreate_centrality_clustering (Metric *d, double far, int k)
 
PartitionalClusteringcreate_centrality_clustering (Embedding *d, double far, int k)
 
PartitionalClusteringWithCentercreate_connectivity_clustering (Embedding *embed, double dist)
 
PartitionalClusteringcreate_connectivity_clustering (Metric *metric, double dist)
 
PartitionalClusteringcreate_diameter_clustering (Metric *d, double maximum_diameter)
 
PartitionalClusteringcreate_gromos_clustering (Metric *d, double cutoff)
 
PartitionalClusteringWithCentercreate_lloyds_kmeans (Embedding *embedding, unsigned int k, unsigned int iterations)
 
algebra::VectorKDs get_centroids (Embedding *d, PartitionalClustering *pc)
 
double get_quantile (const Histogram1D &h, double fraction)
 
Ints get_representatives (Embedding *d, PartitionalClustering *pc)
 
void validate_partitional_clustering (PartitionalClustering *pc, unsigned int n)
 Check that the clustering is a valid clustering of n elements. More...
 

Python only

This functionality is only available in Python.

void show_histogram (HistogramD h, std::string xscale="linear", std::string yscale="linear", Functions curves=Functions())
 

Standard module functions

All IMP modules have a set of standard functions to help get information about the module and about files associated with the module.

std::string get_module_version ()
 
std::string get_module_name ()
 
std::string get_data_path (std::string file_name)
 Return the full path to one of this module's data files. More...
 
std::string get_example_path (std::string file_name)
 Return the full path to one of this module's example files. More...
 

Typedef Documentation

A vector of reference-counting object pointers.

Definition at line 48 of file statistics/embedding.h.

A vector of weak (non reference-counting) pointers to specified objects.

See Also
Embedding

Definition at line 48 of file statistics/embedding.h.

A vector of reference-counting object pointers.

Definition at line 42 of file Metric.h.

A vector of weak (non reference-counting) pointers to specified objects.

See Also
Metric

Definition at line 42 of file Metric.h.

Function Documentation

PartitionalClusteringWithCenter* IMP::statistics::create_bin_based_clustering ( Embedding *  embed,
double  side 
)

The space is grided with bins of side size and all points that fall in the same grid bin are made part of the same cluster.

PartitionalClustering* IMP::statistics::create_centrality_clustering ( Metric *  d,
double  far,
int  k 
)

Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.

Only items closer than far are connected.

PartitionalClustering* IMP::statistics::create_centrality_clustering ( Embedding *  d,
double  far,
int  k 
)

Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.

PartitionalClusteringWithCenter* IMP::statistics::create_connectivity_clustering ( Embedding *  embed,
double  dist 
)

Two points, \(p_i\), \(p_j\) are in the same cluster if there is a sequence of points \(\left(p^{ij}_{0}\dots p^{ij}_k\right)\) such that \(\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d\).

PartitionalClustering* IMP::statistics::create_connectivity_clustering ( Metric *  metric,
double  dist 
)

Two points, \(p_i\), \(p_j\) are in the same cluster if there is a sequence of points \(\left(p^{ij}_{0}\dots p^{ij}_k\right)\) such that \(\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d\).

PartitionalClustering* IMP::statistics::create_diameter_clustering ( Metric *  d,
double  maximum_diameter 
)

Cluster the elements into clusters with at most the specified diameter.

PartitionalClustering* IMP::statistics::create_gromos_clustering ( Metric *  d,
double  cutoff 
)

Cutoff-based clustering as defined in Daura et al. Angew. Chem. Int. Ed. 1999. 38(1‐2): p. 236-240.

PartitionalClusteringWithCenter* IMP::statistics::create_lloyds_kmeans ( Embedding *  embedding,
unsigned int  k,
unsigned int  iterations 
)

Return a k-means clustering of all points contained in the embedding (ie [0... embedding->get_number_of_embeddings())). These points are then clustered into k clusters. More iterations takes longer but produces a better clustering.

The algorithm uses algebra::EuclideanVectorKDMetric for computing distances between embeddings and cluster centers. This can be parameterized if desired.

algebra::VectorKDs IMP::statistics::get_centroids ( Embedding *  d,
PartitionalClustering *  pc 
)

Given a clustering and an embedding, compute the centroid for each cluster

std::string IMP::statistics::get_data_path ( std::string  file_name)

Return the full path to one of this module's data files.

To read the data file "data_library" that was placed in the data directory of this module, do something like

std::ifstream in(IMP::statistics::get_data_path("data_library"));

This will ensure that the code works both when IMP is installed or if used via the setup_environment.sh script.

Note
Each module has its own data directory, so be sure to use this function from the correct module.
std::string IMP::statistics::get_example_path ( std::string  file_name)

Return the full path to one of this module's example files.

To read the example file "example_protein.pdb" that was placed in the examples directory of this module, do something like

std::ifstream in(IMP::statistics::get_example_path("example_protein.pdb"));

This will ensure that the code works both when IMP is installed or if used via the setup_environment.sh script.

Note
Each module has its own example directory, so be sure to use this function from the correct module.
double IMP::statistics::get_quantile ( const Histogram1D &  h,
double  fraction 
)

Return the midpoint of the bin that best approximates the specified quantile (passed as a fraction). That is, passing .5 returns the median. And passing .9

Ints IMP::statistics::get_representatives ( Embedding *  d,
PartitionalClustering *  pc 
)

Given a clustering and an embedding, compute a representative element for each cluster.

void IMP::statistics::show_histogram ( HistogramD  h,
std::string  xscale = "linear",
std::string  yscale = "linear",
Functions  curves = Functions() 
)

In Python, you can use matplot lib, if installed, to show the contents of a histogram. At the moment, only 1D and 2D histograms are supported.

Parameters
[in]hThe histogram to show; the plot is sized to the histogram's bounding box.
[in]xscaleWhether the xscale is "linear" or "log"
[in]yscaleWhether the yscale is "linear" or "log"
[in]curvesA list of Python functions to plot on the histogram as curves. The functions should take one float and return a float.
See Also
HistogramD
void IMP::statistics::validate_partitional_clustering ( PartitionalClustering *  pc,
unsigned int  n 
)

Check that the clustering is a valid clustering of n elements.

An exception is thrown if it is not, if the build is not a fast build.