IMP Reference Guide
develop.549d75e6f4,2024/11/20
The Integrative Modeling Platform
|
Code to compute statistical measures. More...
Code to compute statistical measures.
Data to be clustered is represented one of two ways, either with an IMP::statistics::Embedding or a IMP::statistics::Metric. The representation is then passed to an algorithm that returns a clustering object such as an IMP::statistics::PartitionalClustering.
Author(s): Keren Lasker, Daniel Russel
Maintainer: benmwebb
License: LGPL This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
Publications:
Classes | |
class | ChiSquareMetric |
Compute the distance between two configurations using chi2. More... | |
class | ConfigurationSetRMSDMetric |
class | ConfigurationSetXYZEmbedding |
Embed a configuration using the XYZ coordinates of a set of particles. More... | |
class | Embedding |
Store data to be clustered for embedding based algorithms. More... | |
class | EuclideanMetric |
class | HistogramD |
Dynamically build a histogram embedded in D-dimensional space. More... | |
class | Metric |
Store data to be clustered for distance metric based algorithms. More... | |
class | ParticleEmbedding |
class | PartitionalClustering |
A base class for clustering results where each item is in one cluster. More... | |
class | PartitionalClusteringWithCenter |
class | RecursivePartitionalClusteringEmbedding |
class | RecursivePartitionalClusteringMetric |
Represent a metric for clustering data that has already been clustered once. More... | |
class | VectorDEmbedding |
Simply return the coordinates of a VectorD. More... | |
Typedefs | |
typedef IMP::Vector < IMP::Pointer< Embedding > > | Embeddings |
typedef IMP::Vector < IMP::WeakPointer< Embedding > > | EmbeddingsTemp |
typedef IMP::Vector < IMP::Pointer< Metric > > | Metrics |
typedef IMP::Vector < IMP::WeakPointer< Metric > > | MetricsTemp |
Python only | |
void | show_histogram (HistogramD h, std::string xscale="linear", std::string yscale="linear", Functions curves=Functions()) |
Standard module functions | |
All | |
std::string | get_module_version () |
Return the version of this module, as a string. More... | |
std::string | get_module_name () |
std::string | get_data_path (std::string file_name) |
Return the full path to one of this module's data files. More... | |
std::string | get_example_path (std::string file_name) |
Return the full path to one of this module's example files. More... | |
typedef IMP::Vector<IMP::Pointer< Embedding > > IMP::statistics::Embeddings |
A vector of reference-counting object pointers.
Definition at line 48 of file statistics/embedding.h.
A vector of weak (non reference-counting) pointers to specified objects.
Definition at line 48 of file statistics/embedding.h.
typedef IMP::Vector<IMP::Pointer< Metric > > IMP::statistics::Metrics |
typedef IMP::Vector<IMP::WeakPointer< Metric > > IMP::statistics::MetricsTemp |
PartitionalClusteringWithCenter* IMP::statistics::create_bin_based_clustering | ( | Embedding * | embed, |
double | side | ||
) |
The space is grided with bins of side size and all points that fall in the same grid bin are made part of the same cluster.
PartitionalClustering* IMP::statistics::create_centrality_clustering | ( | Metric * | d, |
double | far, | ||
int | k | ||
) |
Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.
Only items closer than far are connected.
PartitionalClustering* IMP::statistics::create_centrality_clustering | ( | Embedding * | d, |
double | far, | ||
int | k | ||
) |
Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.
PartitionalClusteringWithCenter* IMP::statistics::create_connectivity_clustering | ( | Embedding * | embed, |
double | dist | ||
) |
Two points, \(p_i\), \(p_j\) are in the same cluster if there is a sequence of points \(\left(p^{ij}_{0}\dots p^{ij}_k\right)\) such that \(\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d\).
PartitionalClustering* IMP::statistics::create_connectivity_clustering | ( | Metric * | metric, |
double | dist | ||
) |
Two points, \(p_i\), \(p_j\) are in the same cluster if there is a sequence of points \(\left(p^{ij}_{0}\dots p^{ij}_k\right)\) such that \(\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d\).
PartitionalClustering* IMP::statistics::create_diameter_clustering | ( | Metric * | d, |
double | maximum_diameter | ||
) |
Cluster the elements into clusters with at most the specified diameter.
PartitionalClustering* IMP::statistics::create_gromos_clustering | ( | Metric * | d, |
double | cutoff | ||
) |
Cutoff-based clustering as defined in Daura et al. Angew. Chem. Int. Ed. 1999. 38(1‐2): p. 236-240.
PartitionalClusteringWithCenter* IMP::statistics::create_lloyds_kmeans | ( | Embedding * | embedding, |
unsigned int | k, | ||
unsigned int | iterations | ||
) |
Return a k-means clustering of all points contained in the embedding (ie [0... embedding->get_number_of_embeddings())). These points are then clustered into k clusters. More iterations takes longer but produces a better clustering.
The algorithm uses algebra::EuclideanVectorKDMetric for computing distances between embeddings and cluster centers. This can be parameterized if desired.
algebra::VectorKDs IMP::statistics::get_centroids | ( | Embedding * | d, |
PartitionalClustering * | pc | ||
) |
Given a clustering and an embedding, compute the centroid for each cluster
std::string IMP::statistics::get_data_path | ( | std::string | file_name | ) |
Return the full path to one of this module's data files.
To read the data file "data_library" that was placed in the data
directory of this module, do something like
This will ensure that the code works both when IMP is installed or if used via the setup_environment.sh
script.
std::string IMP::statistics::get_example_path | ( | std::string | file_name | ) |
Return the full path to one of this module's example files.
To read the example file "example_protein.pdb" that was placed in the examples
directory of this module, do something like
This will ensure that the code works both when IMP is installed or if used via the setup_environment.sh
script.
std::string IMP::statistics::get_module_version | ( | ) |
Return the version of this module, as a string.
Definition at line 5 of file EMageFit/__init__.py.
double IMP::statistics::get_quantile | ( | const Histogram1D & | h, |
double | fraction | ||
) |
Return the midpoint of the bin that best approximates the specified quantile (passed as a fraction). That is, passing .5 returns the median. And passing .9
Ints IMP::statistics::get_representatives | ( | Embedding * | d, |
PartitionalClustering * | pc | ||
) |
Given a clustering and an embedding, compute a representative element for each cluster.
void IMP::statistics::show_histogram | ( | HistogramD | h, |
std::string | xscale = "linear" , |
||
std::string | yscale = "linear" , |
||
Functions | curves = Functions() |
||
) |
In Python, you can use matplot lib, if installed, to show the contents of a histogram. At the moment, only 1D and 2D histograms are supported.
[in] | h | The histogram to show; the plot is sized to the histogram's bounding box. |
[in] | xscale | Whether the xscale is "linear" or "log" |
[in] | yscale | Whether the yscale is "linear" or "log" |
[in] | curves | A list of Python functions to plot on the histogram as curves. The functions should take one float and return a float. |
void IMP::statistics::validate_partitional_clustering | ( | PartitionalClustering * | pc, |
unsigned int | n | ||
) |
Check that the clustering is a valid clustering of n elements.
An exception is thrown if it is not, if the build is not a fast build.