IMP  2.3.0
The Integrative Modeling Platform
IMP::statistics Namespace Reference

Code to compute statistical measures. More...

Detailed Description

Code to compute statistical measures.

Data to be clustered is represented one of two ways, either with an IMP::statistics::Embedding or a IMP::statistics::Metric. The representation is then passed to an algorithm that returns a clustering object such as an IMP::statistics::PartitionalClustering.

Info

Author(s): Keren Lasker, Daniel Russel

Maintainer: benmwebb

License: LGPL This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Publications:

Classes

class  ChiSquareMetric
 Compute the distance between two configurations using chi2. More...
 
class  ConfigurationSetRMSDMetric
 
class  ConfigurationSetXYZEmbedding
 Embed a configuration using the XYZ coordinates of a set of particles. More...
 
class  Embedding
 Store data to be clustered for embedding based algorithms. More...
 
class  EuclideanMetric
 
class  HistogramD
 Dynamically build a histogram embedded in D-dimensional space. More...
 
class  Metric
 Store data to be clustered for distance metric based algorithms. More...
 
class  ParticleEmbedding
 
class  PartitionalClustering
 A base class for clustering results where each item is in one cluster. More...
 
class  PartitionalClusteringWithCenter
 
class  RecursivePartitionalClusteringEmbedding
 
class  RecursivePartitionalClusteringMetric
 
class  VectorDEmbedding
 Simply return the coordinates of a VectorD. More...
 

Typedefs

typedef IMP::base::Vector
< IMP::base::Pointer
< Embedding > > 
Embeddings
 
typedef IMP::base::Vector
< IMP::base::WeakPointer
< Embedding > > 
EmbeddingsTemp
 
typedef IMP::base::Vector
< IMP::base::Pointer< Metric > > 
Metrics
 
typedef IMP::base::Vector
< IMP::base::WeakPointer
< Metric > > 
MetricsTemp
 

Functions

PartitionalClusteringWithCentercreate_bin_based_clustering (Embedding *embed, double side)
 
PartitionalClusteringcreate_centrality_clustering (Metric *d, double far, int k)
 
PartitionalClusteringcreate_centrality_clustering (Embedding *d, double far, int k)
 
PartitionalClusteringWithCentercreate_connectivity_clustering (Embedding *embed, double dist)
 
PartitionalClusteringcreate_connectivity_clustering (Metric *metric, double dist)
 
PartitionalClusteringcreate_diameter_clustering (Metric *d, double maximum_diameter)
 
PartitionalClusteringcreate_gromos_clustering (Metric *d, double cutoff)
 
PartitionalClusteringWithCentercreate_lloyds_kmeans (Embedding *embedding, unsigned int k, unsigned int iterations)
 
algebra::VectorKDs get_centroids (Embedding *d, PartitionalClustering *pc)
 
double get_quantile (const Histogram1D &h, double fraction)
 
Ints get_representatives (Embedding *d, PartitionalClustering *pc)
 
void validate_partitional_clustering (PartitionalClustering *pc, unsigned int n)
 Check that the clustering is a valid clustering of n elements. More...
 

Python only

This functionality is only available in Python.

void show_histogram (HistogramD h, std::string xscale="linear", std::string yscale="linear", Functions curves=Functions())
 

Standard module functions

All IMP modules have a set of standard functions to help get information about the module and about files associated with the module.

std::string get_module_version ()
 
std::string get_module_name ()
 
std::string get_data_path (std::string file_name)
 Return the full path to installed data. More...
 
std::string get_example_path (std::string file_name)
 Return the path to installed example data for this module. More...
 

Typedef Documentation

Store a set of objects.

Definition at line 48 of file statistics/embedding.h.

Pass a set of objects.

See Also
Embedding

Definition at line 48 of file statistics/embedding.h.

Store a set of objects.

Definition at line 42 of file Metric.h.

Pass a set of objects.

See Also
Metric

Definition at line 42 of file Metric.h.

Function Documentation

PartitionalClusteringWithCenter* IMP::statistics::create_bin_based_clustering ( Embedding *  embed,
double  side 
)

The space is grided with bins of side size and all points that fall in the same grid bin are made part of the same cluster.

PartitionalClustering* IMP::statistics::create_centrality_clustering ( Metric *  d,
double  far,
int  k 
)

Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.

Only items closer than far are connected.

PartitionalClustering* IMP::statistics::create_centrality_clustering ( Embedding *  d,
double  far,
int  k 
)

Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.

PartitionalClusteringWithCenter* IMP::statistics::create_connectivity_clustering ( Embedding *  embed,
double  dist 
)

Two points, \(p_i\), \(p_j\) are in the same cluster if there is a sequence of points \(\left(p^{ij}_{0}\dots p^{ij}_k\right)\) such that \(\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d\).

PartitionalClustering* IMP::statistics::create_connectivity_clustering ( Metric *  metric,
double  dist 
)

Two points, \(p_i\), \(p_j\) are in the same cluster if there is a sequence of points \(\left(p^{ij}_{0}\dots p^{ij}_k\right)\) such that \(\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d\).

PartitionalClustering* IMP::statistics::create_diameter_clustering ( Metric *  d,
double  maximum_diameter 
)

Cluster the elements into clusters with at most the specified diameter.

PartitionalClustering* IMP::statistics::create_gromos_clustering ( Metric *  d,
double  cutoff 
)

Cutoff-based clustering as defined in Daura et al. Angew. Chem. Int. Ed. 1999. 38(1‐2): p. 236-240.

PartitionalClusteringWithCenter* IMP::statistics::create_lloyds_kmeans ( Embedding *  embedding,
unsigned int  k,
unsigned int  iterations 
)

Return a k-means clustering of all points contained in the embedding (ie [0... embedding->get_number_of_embeddings())). These points are then clustered into k clusters. More iterations takes longer but produces a better clustering.

The algorithm uses algebra::EuclideanVectorKDMetric for computing distances between embeddings and cluster centers. This can be parameterized if desired.

algebra::VectorKDs IMP::statistics::get_centroids ( Embedding *  d,
PartitionalClustering *  pc 
)

Given a clustering and an embedding, compute the centroid for each cluster

std::string IMP::statistics::get_data_path ( std::string  file_name)

Return the full path to installed data.

Each module has its own data directory, so be sure to use the version of this function in the correct module. To read the data file "data_library" that was placed in the data directory of module "mymodule", do something like

std::ifstream in(IMP::mymodule::get_data_path("data_library"));

This will ensure that the code works when IMP is installed or used via the setup_environment.sh script.

std::string IMP::statistics::get_example_path ( std::string  file_name)

Return the path to installed example data for this module.

Each module has its own example directory, so be sure to use the version of this function in the correct module. For example to read the file example_protein.pdb located in the examples directory of the IMP::atom module, do

model));

This will ensure that the code works when IMP is installed or used via the setup_environment.sh script.

double IMP::statistics::get_quantile ( const Histogram1D &  h,
double  fraction 
)

Return the midpoint of the bin that best approximates the specified quantile (passed as a fraction). That is, passing .5 returns the median. And passing .9

Ints IMP::statistics::get_representatives ( Embedding *  d,
PartitionalClustering *  pc 
)

Given a clustering and an embedding, compute a representative element for each cluster.

void IMP::statistics::show_histogram ( HistogramD  h,
std::string  xscale = "linear",
std::string  yscale = "linear",
Functions  curves = Functions() 
)

In Python, you can use matplot lib, if installed, to show the contents of a histogram. At the moment, only 1D and 2D histograms are supported.

Parameters
[in]hThe histogram to show; the plot is sized to the histogram's bounding box.
[in]xscaleWhether the xscale is "linear" or "log"
[in]yscaleWhether the yscale is "linear" or "log"
[in]curvesA list of Python functions to plot on the histogram as curves. The functions should take one float and return a float.
See Also
HistogramD
void IMP::statistics::validate_partitional_clustering ( PartitionalClustering *  pc,
unsigned int  n 
)

Check that the clustering is a valid clustering of n elements.

An exception is thrown if it is not, if the build is not a fast build.