IMP Reference Guide  develop.18ca3ba1ae,2021/02/28 The Integrative Modeling Platform
IMP::statistics Namespace Reference

Code to compute statistical measures. More...

## Detailed Description

Code to compute statistical measures.

Data to be clustered is represented one of two ways, either with an IMP::statistics::Embedding or a IMP::statistics::Metric. The representation is then passed to an algorithm that returns a clustering object such as an IMP::statistics::PartitionalClustering.

# Info

Maintainer: benmwebb

License: LGPL This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Publications:

## Classes

class  ChiSquareMetric
Compute the distance between two configurations using chi2. More...

class  ConfigurationSetRMSDMetric

class  ConfigurationSetXYZEmbedding
Embed a configuration using the XYZ coordinates of a set of particles. More...

class  Embedding
Store data to be clustered for embedding based algorithms. More...

class  EuclideanMetric

class  HistogramD
Dynamically build a histogram embedded in D-dimensional space. More...

class  Metric
Store data to be clustered for distance metric based algorithms. More...

class  ParticleEmbedding

class  PartitionalClustering
A base class for clustering results where each item is in one cluster. More...

class  PartitionalClusteringWithCenter

class  RecursivePartitionalClusteringEmbedding

class  RecursivePartitionalClusteringMetric
Represent a metric for clustering data that has already been clustered once. More...

class  VectorDEmbedding
Simply return the coordinates of a VectorD. More...

## Typedefs

typedef IMP::Vector
< IMP::Pointer< Embedding > >
Embeddings

typedef IMP::Vector
< IMP::WeakPointer< Embedding > >
EmbeddingsTemp

typedef IMP::Vector
< IMP::Pointer< Metric > >
Metrics

typedef IMP::Vector
< IMP::WeakPointer< Metric > >
MetricsTemp

## Functions

PartitionalClusteringWithCentercreate_bin_based_clustering (Embedding *embed, double side)

PartitionalClusteringcreate_centrality_clustering (Metric *d, double far, int k)

PartitionalClusteringcreate_centrality_clustering (Embedding *d, double far, int k)

PartitionalClusteringWithCentercreate_connectivity_clustering (Embedding *embed, double dist)

PartitionalClusteringcreate_connectivity_clustering (Metric *metric, double dist)

PartitionalClusteringcreate_diameter_clustering (Metric *d, double maximum_diameter)

PartitionalClusteringcreate_gromos_clustering (Metric *d, double cutoff)

PartitionalClusteringWithCentercreate_lloyds_kmeans (Embedding *embedding, unsigned int k, unsigned int iterations)

algebra::VectorKDs get_centroids (Embedding *d, PartitionalClustering *pc)

double get_quantile (const Histogram1D &h, double fraction)

Ints get_representatives (Embedding *d, PartitionalClustering *pc)

void validate_partitional_clustering (PartitionalClustering *pc, unsigned int n)
Check that the clustering is a valid clustering of n elements. More...

## Python only

This functionality is only available in Python.

void show_histogram (HistogramD h, std::string xscale="linear", std::string yscale="linear", Functions curves=Functions())

## Standard module functions

All IMP modules have a set of standard functions to help get information about the module and about files associated with the module.

std::string get_module_version ()
Return the version of this module, as a string. More...

std::string get_module_name ()

std::string get_data_path (std::string file_name)
Return the full path to one of this module's data files. More...

std::string get_example_path (std::string file_name)
Return the full path to one of this module's example files. More...

## Typedef Documentation

 typedef IMP::Vector > IMP::statistics::Embeddings

A vector of reference-counting object pointers.

Definition at line 48 of file statistics/embedding.h.

A vector of weak (non reference-counting) pointers to specified objects.

Embedding

Definition at line 48 of file statistics/embedding.h.

 typedef IMP::Vector > IMP::statistics::Metrics

A vector of reference-counting object pointers.

Definition at line 42 of file Metric.h.

 typedef IMP::Vector > IMP::statistics::MetricsTemp

A vector of weak (non reference-counting) pointers to specified objects.

Metric

Definition at line 42 of file Metric.h.

## Function Documentation

 PartitionalClusteringWithCenter* IMP::statistics::create_bin_based_clustering ( Embedding * embed, double side )

The space is grided with bins of side size and all points that fall in the same grid bin are made part of the same cluster.

 PartitionalClustering* IMP::statistics::create_centrality_clustering ( Metric * d, double far, int k )

Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.

Only items closer than far are connected.

 PartitionalClustering* IMP::statistics::create_centrality_clustering ( Embedding * d, double far, int k )

Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.

 PartitionalClusteringWithCenter* IMP::statistics::create_connectivity_clustering ( Embedding * embed, double dist )

Two points, $$p_i$$, $$p_j$$ are in the same cluster if there is a sequence of points $$\left(p^{ij}_{0}\dots p^{ij}_k\right)$$ such that $$\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d$$.

 PartitionalClustering* IMP::statistics::create_connectivity_clustering ( Metric * metric, double dist )

Two points, $$p_i$$, $$p_j$$ are in the same cluster if there is a sequence of points $$\left(p^{ij}_{0}\dots p^{ij}_k\right)$$ such that $$\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d$$.

 PartitionalClustering* IMP::statistics::create_diameter_clustering ( Metric * d, double maximum_diameter )

Cluster the elements into clusters with at most the specified diameter.

 PartitionalClustering* IMP::statistics::create_gromos_clustering ( Metric * d, double cutoff )

Cutoff-based clustering as defined in Daura et al. Angew. Chem. Int. Ed. 1999. 38(1‐2): p. 236-240.

 PartitionalClusteringWithCenter* IMP::statistics::create_lloyds_kmeans ( Embedding * embedding, unsigned int k, unsigned int iterations )

Return a k-means clustering of all points contained in the embedding (ie [0... embedding->get_number_of_embeddings())). These points are then clustered into k clusters. More iterations takes longer but produces a better clustering.

The algorithm uses algebra::EuclideanVectorKDMetric for computing distances between embeddings and cluster centers. This can be parameterized if desired.

 algebra::VectorKDs IMP::statistics::get_centroids ( Embedding * d, PartitionalClustering * pc )

Given a clustering and an embedding, compute the centroid for each cluster

 std::string IMP::statistics::get_data_path ( std::string file_name )

Return the full path to one of this module's data files.

To read the data file "data_library" that was placed in the data directory of this module, do something like

std::ifstream in(IMP::statistics::get_data_path("data_library"));

This will ensure that the code works both when IMP is installed or if used via the setup_environment.sh script.

Note
Each module has its own data directory, so be sure to use this function from the correct module.
 std::string IMP::statistics::get_example_path ( std::string file_name )

Return the full path to one of this module's example files.

To read the example file "example_protein.pdb" that was placed in the examples directory of this module, do something like

std::ifstream in(IMP::statistics::get_example_path("example_protein.pdb"));

This will ensure that the code works both when IMP is installed or if used via the setup_environment.sh script.

Note
Each module has its own example directory, so be sure to use this function from the correct module.
 std::string IMP::statistics::get_module_version ( )

Return the version of this module, as a string.

Note
This function is only available in Python.

Definition at line 5 of file EMageFit/__init__.py.

 double IMP::statistics::get_quantile ( const Histogram1D & h, double fraction )

Return the midpoint of the bin that best approximates the specified quantile (passed as a fraction). That is, passing .5 returns the median. And passing .9

 Ints IMP::statistics::get_representatives ( Embedding * d, PartitionalClustering * pc )

Given a clustering and an embedding, compute a representative element for each cluster.

 void IMP::statistics::show_histogram ( HistogramD h, std::string xscale = "linear", std::string yscale = "linear", Functions curves = Functions() )

In Python, you can use matplot lib, if installed, to show the contents of a histogram. At the moment, only 1D and 2D histograms are supported.

Parameters
 [in] h The histogram to show; the plot is sized to the histogram's bounding box. [in] xscale Whether the xscale is "linear" or "log" [in] yscale Whether the yscale is "linear" or "log" [in] curves A list of Python functions to plot on the histogram as curves. The functions should take one float and return a float.