Code to compute statistical measures. More...

Detailed Description

Code to compute statistical measures.

Data to be clustered is represented one of two ways, either with an IMP::statistics::Embedding or a IMP::statistics::Metric. The representation is then passed to an algorithm that returns a clustering object such as an IMP::statistics::PartitionalClustering.

Info

Author(s): Keren Lasker, Daniel Russel

Maintainer: benmwebb

License: LGPL This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Publications:

See main IMP papers list.

Classes
class	ChiSquareMetric
	Compute the distance between two configurations using chi2. More...

class	ConfigurationSetRMSDMetric

class	ConfigurationSetXYZEmbedding
	Embed a configuration using the XYZ coordinates of a set of particles. More...

class	Embedding
	Store data to be clustered for embedding based algorithms. More...

class	EuclideanMetric

class	HistogramD
	Dynamically build a histogram embedded in D-dimensional space. More...

class	Metric
	Store data to be clustered for distance metric based algorithms. More...

class	ParticleEmbedding

class	PartitionalClustering
	A base class for clustering results where each item is in one cluster. More...

class	PartitionalClusteringWithCenter

class	RecursivePartitionalClusteringEmbedding

class	RecursivePartitionalClusteringMetric
	Represent a metric for clustering data that has already been clustered once. More...

class	VectorDEmbedding
	Simply return the coordinates of a VectorD. More...

Typedefs
typedef IMP::Vector < IMP::Pointer< Embedding > >	Embeddings

typedef IMP::Vector < IMP::WeakPointer< Embedding > >	EmbeddingsTemp

typedef IMP::Vector < IMP::Pointer< Metric > >	Metrics

typedef IMP::Vector < IMP::WeakPointer< Metric > >	MetricsTemp

Functions
PartitionalClusteringWithCenter *	create_bin_based_clustering (Embedding *embed, double side)

PartitionalClustering *	create_centrality_clustering (Metric *d, double far, int k)

PartitionalClustering *	create_centrality_clustering (Embedding *d, double far, int k)

PartitionalClusteringWithCenter *	create_connectivity_clustering (Embedding *embed, double dist)

PartitionalClustering *	create_connectivity_clustering (Metric *metric, double dist)

PartitionalClustering *	create_diameter_clustering (Metric *d, double maximum_diameter)

PartitionalClustering *	create_gromos_clustering (Metric *d, double cutoff)

PartitionalClusteringWithCenter *	create_lloyds_kmeans (Embedding *embedding, unsigned int k, unsigned int iterations)

algebra::VectorKDs	get_centroids (Embedding d, PartitionalClustering pc)

double	get_quantile (const Histogram1D &h, double fraction)

Ints	get_representatives (Embedding d, PartitionalClustering pc)

void	validate_partitional_clustering (PartitionalClustering *pc, unsigned int n)
	Check that the clustering is a valid clustering of n elements. More...

Python only
This functionality is only available in Python.
void	show_histogram (HistogramD h, std::string xscale="linear", std::string yscale="linear", Functions curves=Functions())

Standard module functions
All `IMP` modules have a set of standard functions to help get information about the module and about files associated with the module.
std::string	get_module_version ()

std::string	get_module_name ()

std::string	get_data_path (std::string file_name)
	Return the full path to one of this module's data files. More...

std::string	get_example_path (std::string file_name)
	Return the full path to one of this module's example files. More...

Typedef Documentation

typedef IMP::Vector<IMP::Pointer< Embedding > > IMP::statistics::Embeddings

A vector of reference-counting object pointers.

Definition at line 48 of file statistics/embedding.h.

typedef IMP::Vector<IMP::WeakPointer< Embedding > > IMP::statistics::EmbeddingsTemp

A vector of weak (non reference-counting) pointers to specified objects.

See Also: Embedding

Definition at line 48 of file statistics/embedding.h.

typedef IMP::Vector<IMP::Pointer< Metric > > IMP::statistics::Metrics

A vector of reference-counting object pointers.

Definition at line 42 of file Metric.h.

typedef IMP::Vector<IMP::WeakPointer< Metric > > IMP::statistics::MetricsTemp

A vector of weak (non reference-counting) pointers to specified objects.

See Also: Metric

Definition at line 42 of file Metric.h.

Function Documentation

PartitionalClusteringWithCenter* IMP::statistics::create_bin_based_clustering	(	Embedding *	embed,
		double	side
	)

The space is grided with bins of side size and all points that fall in the same grid bin are made part of the same cluster.

PartitionalClustering* IMP::statistics::create_centrality_clustering	(	Metric *	d,
		double	far,
		int	k
	)

Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.

Only items closer than far are connected.

PartitionalClustering* IMP::statistics::create_centrality_clustering	(	Embedding *	d,
		double	far,
		int	k
	)

Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.

PartitionalClusteringWithCenter* IMP::statistics::create_connectivity_clustering	(	Embedding *	embed,
		double	dist
	)

Two points, $p_i$ , $p_j$ are in the same cluster if there is a sequence of points $\left(p^{ij}_{0}\dots p^{ij}_k\right)$ such that $\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d$ .

PartitionalClustering* IMP::statistics::create_connectivity_clustering	(	Metric *	metric,
		double	dist
	)

Two points, $p_i$ , $p_j$ are in the same cluster if there is a sequence of points $\left(p^{ij}_{0}\dots p^{ij}_k\right)$ such that $\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d$ .

PartitionalClustering* IMP::statistics::create_diameter_clustering	(	Metric *	d,
		double	maximum_diameter
	)

Cluster the elements into clusters with at most the specified diameter.

PartitionalClustering* IMP::statistics::create_gromos_clustering	(	Metric *	d,
		double	cutoff
	)

Cutoff-based clustering as defined in Daura et al. Angew. Chem. Int. Ed. 1999. 38(1‐2): p. 236-240.

PartitionalClusteringWithCenter* IMP::statistics::create_lloyds_kmeans	(	Embedding *	embedding,
		unsigned int	k,
		unsigned int	iterations
	)

Return a k-means clustering of all points contained in the embedding (ie [0... embedding->get_number_of_embeddings())). These points are then clustered into k clusters. More iterations takes longer but produces a better clustering.

The algorithm uses algebra::EuclideanVectorKDMetric for computing distances between embeddings and cluster centers. This can be parameterized if desired.

algebra::VectorKDs IMP::statistics::get_centroids	(	Embedding *	d,
		PartitionalClustering *	pc
	)

Given a clustering and an embedding, compute the centroid for each cluster

std::string IMP::statistics::get_data_path ( std::string file_name )

Return the full path to one of this module's data files.

To read the data file "data_library" that was placed in the data directory of this module, do something like

std::ifstream in(IMP::statistics::get_data_path("data_library"));

This will ensure that the code works both when IMP is installed or if used via the setup_environment.sh script.

Note: Each module has its own data directory, so be sure to use this function from the correct module.

std::string IMP::statistics::get_example_path ( std::string file_name )

Return the full path to one of this module's example files.

To read the example file "example_protein.pdb" that was placed in the examples directory of this module, do something like

std::ifstream in(IMP::statistics::get_example_path("example_protein.pdb"));

This will ensure that the code works both when IMP is installed or if used via the setup_environment.sh script.

Note: Each module has its own example directory, so be sure to use this function from the correct module.

double IMP::statistics::get_quantile	(	const Histogram1D &	h,
		double	fraction
	)

Return the midpoint of the bin that best approximates the specified quantile (passed as a fraction). That is, passing .5 returns the median. And passing .9

Ints IMP::statistics::get_representatives	(	Embedding *	d,
		PartitionalClustering *	pc
	)

Given a clustering and an embedding, compute a representative element for each cluster.

void IMP::statistics::show_histogram	(	HistogramD	h,
		std::string	xscale = `"linear"`,
		std::string	yscale = `"linear"`,
		Functions	curves = `Functions()`
	)

In Python, you can use matplot lib, if installed, to show the contents of a histogram. At the moment, only 1D and 2D histograms are supported.

Parameters

[in]	h	The histogram to show; the plot is sized to the histogram's bounding box.
[in]	xscale	Whether the xscale is "linear" or "log"
[in]	yscale	Whether the yscale is "linear" or "log"
[in]	curves	A list of Python functions to plot on the histogram as curves. The functions should take one float and return a float.

See Also: HistogramD

void IMP::statistics::validate_partitional_clustering	(	PartitionalClustering *	pc,
		unsigned int	n
	)

Check that the clustering is a valid clustering of n elements.

An exception is thrown if it is not, if the build is not a fast build.

Detailed Description

Info

Classes

Typedefs

Functions

Python only

Standard module functions

Typedef Documentation

Function Documentation