IMP
2.0.0
The Integrative Modeling Platform
|
See IMP.statistics Overview for more information.
Classes | |
class | ConfigurationSetRMSDMetric |
class | ConfigurationSetXYZEmbedding |
Embed a configuration using the XYZ coordinates of a set of particles. More... | |
class | Embedding |
Store data to be clustered for embedding based algorithms. More... | |
class | Histogram |
Histogram. More... | |
class | HistogramD |
class | Metric |
Store data to be clustered for distance metric based algorithms. More... | |
class | ParticleEmbedding |
class | PartitionalClustering |
A base class for clustering results where each item is in one cluster. More... | |
class | PartitionalClusteringWithCenter |
class | RecursivePartitionalClusteringEmbedding |
class | RecursivePartitionalClusteringMetric |
class | VectorDEmbedding |
Simply return the coordinates of a VectorD. More... | |
Typedefs | |
typedef IMP::base::Vector < IMP::base::Pointer < Embedding > > | Embeddings |
typedef IMP::base::Vector < Histogram > | Histograms |
typedef IMP::base::Vector < IMP::base::Pointer< Metric > > | Metrics |
Python only | |
void | show_histogram (HistogramD h, std::string xscale="linear", std::string yscale="linear", Functions curves=Functions()) |
Store a set of objects.
Definition at line 34 of file statistics/embedding.h.
Pass or store a set of Histogram .
Definition at line 49 of file Histogram.h.
PartitionalClusteringWithCenter* IMP::statistics::create_bin_based_clustering | ( | Embedding * | embed, |
double | side | ||
) |
The space is grided with bins of side size and all points that fall in the same grid bin are made part of the same cluster.
PartitionalClustering* IMP::statistics::create_centrality_clustering | ( | Metric * | d, |
double | far, | ||
int | k | ||
) |
Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.
Only items closer than far are connected.
PartitionalClustering* IMP::statistics::create_centrality_clustering | ( | Embedding * | d, |
double | far, | ||
int | k | ||
) |
Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.
PartitionalClusteringWithCenter* IMP::statistics::create_connectivity_clustering | ( | Embedding * | embed, |
double | dist | ||
) |
Two points, \(p_i\), \(p_j\) are in the same cluster if there is a sequence of points \(\left(p^{ij}_{0}\dots p^{ij}_k\right)\) such that \(\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d\).
PartitionalClustering* IMP::statistics::create_connectivity_clustering | ( | Metric * | metric, |
double | dist | ||
) |
Two points, \(p_i\), \(p_j\) are in the same cluster if there is a sequence of points \(\left(p^{ij}_{0}\dots p^{ij}_k\right)\) such that \(\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d\).
PartitionalClustering* IMP::statistics::create_diameter_clustering | ( | Metric * | d, |
double | maximum_diameter | ||
) |
Cluster the elements into clusters with at most the specified diameter.
PartitionalClusteringWithCenter* IMP::statistics::create_lloyds_kmeans | ( | Embedding * | embedding, |
unsigned int | k, | ||
unsigned int | iterations | ||
) |
Return a k-means clustering of all points contained in the embedding (ie [0... embedding->get_number_of_embeddings())). These points are then clustered into k clusters. More iterations takes longer but produces a better clustering.
The algorithm uses algebra::EuclideanVectorKDMetric for computing distances between embeddings and cluster centers. This can be parameterized if desired.
algebra::VectorKDs IMP::statistics::get_centroids | ( | Embedding * | d, |
PartitionalClustering * | pc | ||
) |
Given a clustering and an embedding, compute the centroid for each cluster
std::string IMP::statistics::get_data_path | ( | std::string | file_name | ) |
Each module has its own data directory, so be sure to use the version of this function in the correct module. To read the data file "data_library" that was placed in the data
directory of module "mymodule", do something like
This will ensure that the code works when IMP
is installed or used via the tools/imppy.sh
script.
std::string IMP::statistics::get_example_path | ( | std::string | file_name | ) |
Each module has its own example directory, so be sure to use the version of this function in the correct module. For example to read the file example_protein.pdb
located in the examples
directory of the IMP::atom module, do
This will ensure that the code works when IMP
is installed or used via the tools/imppy.sh
script.
double IMP::statistics::get_quantile | ( | const Histogram1D & | h, |
double | fraction | ||
) |
Return the midpoint of the bin that best approximates the specified quantile (passed as a fraction). That is,
passing .5 returns the median. And passing .9
Ints IMP::statistics::get_representatives | ( | Embedding * | d, |
PartitionalClustering * | pc | ||
) |
Given a clustering and an embedding, compute a representatative element for each cluster.
void show_histogram | ( | HistogramD | h, |
std::string | xscale = "linear" , |
||
std::string | yscale = "linear" , |
||
Functions | curves = Functions() |
||
) |
In python, you can use matplot lib, if installed, to show the contents of a histogram. At the moment, only 1D and 2D histograms are supported.
[in] | h | The histogram to show, the plot is sized to the histograms bounding box. |
[in] | xscale | Whether the xscale is "linear" or "log" |
[in] | yscale | Whether the yscale is "linear" or "log" |
[in] | curves | A list of python functions to plot on the histogram as curves. The functions should take one float and return a float. |
void IMP::statistics::validate_partitional_clustering | ( | PartitionalClustering * | pc, |
unsigned int | n | ||
) |
Check that the clustering is a valid clustering of n elements. An exception is thrown if it is not, if the build is not a fast build.