See IMP.statistics Overview for more information.

Classes
class	ConfigurationSetRMSDMetric

class	ConfigurationSetXYZEmbedding
	Embed a configuration using the XYZ coordinates of a set of particles. More...

class	Embedding
	Store data to be clustered for embedding based algorithms. More...

class	Histogram
	Histogram. More...

class	HistogramD

class	Metric
	Store data to be clustered for distance metric based algorithms. More...

class	ParticleEmbedding

class	PartitionalClustering
	A base class for clustering results where each item is in one cluster. More...

class	PartitionalClusteringWithCenter

class	RecursivePartitionalClusteringEmbedding

class	RecursivePartitionalClusteringMetric

class	VectorDEmbedding
	Simply return the coordinates of a VectorD. More...

Typedefs
typedef IMP::base::Vector < IMP::base::Pointer < Embedding > >	Embeddings

typedef IMP::base::Vector < Histogram >	Histograms

typedef IMP::base::Vector < IMP::base::Pointer< Metric > >	Metrics

Functions
PartitionalClusteringWithCenter *	create_bin_based_clustering (Embedding *embed, double side)

PartitionalClustering *	create_centrality_clustering (Metric *d, double far, int k)

PartitionalClustering *	create_centrality_clustering (Embedding *d, double far, int k)

PartitionalClusteringWithCenter *	create_connectivity_clustering (Embedding *embed, double dist)

PartitionalClustering *	create_connectivity_clustering (Metric *metric, double dist)

PartitionalClustering *	create_diameter_clustering (Metric *d, double maximum_diameter)

PartitionalClusteringWithCenter *	create_lloyds_kmeans (Embedding *embedding, unsigned int k, unsigned int iterations)

algebra::VectorKDs	get_centroids (Embedding d, PartitionalClustering pc)

std::string	get_data_path (std::string file_name)
	Return the full path to installed data. More...

std::string	get_example_path (std::string file_name)
	Return the path to installed example data for this module. More...

double	get_quantile (const Histogram1D &h, double fraction)

Ints	get_representatives (Embedding d, PartitionalClustering pc)

void	validate_partitional_clustering (PartitionalClustering *pc, unsigned int n)

Python only
This functionality is only available in python.
void	show_histogram (HistogramD h, std::string xscale="linear", std::string yscale="linear", Functions curves=Functions())

Standard module methods
All `IMP` modules have a set of standard methods to help get information about the module and about files associated with the module.
std::string	get_module_version ()

std::string	get_module_name ()

Typedef Documentation

typedef IMP::base::Vector<IMP::base::Pointer< Embedding > > IMP::statistics::Embeddings

Store a set of objects.

Definition at line 34 of file statistics/embedding.h.

typedef IMP::base::Vector< Histogram > IMP::statistics::Histograms

Pass or store a set of Histogram .

Definition at line 49 of file Histogram.h.

typedef IMP::base::Vector<IMP::base::Pointer< Metric > > IMP::statistics::Metrics

Store a set of objects.

Definition at line 35 of file Metric.h.

Function Documentation

PartitionalClusteringWithCenter* IMP::statistics::create_bin_based_clustering	(	Embedding *	embed,
		double	side
	)

The space is grided with bins of side size and all points that fall in the same grid bin are made part of the same cluster.

PartitionalClustering* IMP::statistics::create_centrality_clustering	(	Metric *	d,
		double	far,
		int	k
	)

Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.

Only items closer than far are connected.

PartitionalClustering* IMP::statistics::create_centrality_clustering	(	Embedding *	d,
		double	far,
		int	k
	)

Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.

PartitionalClusteringWithCenter* IMP::statistics::create_connectivity_clustering	(	Embedding *	embed,
		double	dist
	)

Two points, \(p_i\), \(p_j\) are in the same cluster if there is a sequence of points \(\left(p^{ij}_{0}\dots p^{ij}_k\right)\) such that \(\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d\).

PartitionalClustering* IMP::statistics::create_connectivity_clustering	(	Metric *	metric,
		double	dist
	)

Two points, \(p_i\), \(p_j\) are in the same cluster if there is a sequence of points \(\left(p^{ij}_{0}\dots p^{ij}_k\right)\) such that \(\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d\).

PartitionalClustering* IMP::statistics::create_diameter_clustering	(	Metric *	d,
		double	maximum_diameter
	)

Cluster the elements into clusters with at most the specified diameter.

PartitionalClusteringWithCenter* IMP::statistics::create_lloyds_kmeans	(	Embedding *	embedding,
		unsigned int	k,
		unsigned int	iterations
	)

Return a k-means clustering of all points contained in the embedding (ie [0... embedding->get_number_of_embeddings())). These points are then clustered into k clusters. More iterations takes longer but produces a better clustering.

The algorithm uses algebra::EuclideanVectorKDMetric for computing distances between embeddings and cluster centers. This can be parameterized if desired.

algebra::VectorKDs IMP::statistics::get_centroids	(	Embedding *	d,
		PartitionalClustering *	pc
	)

Given a clustering and an embedding, compute the centroid for each cluster

std::string IMP::statistics::get_data_path ( std::string file_name )

Each module has its own data directory, so be sure to use the version of this function in the correct module. To read the data file "data_library" that was placed in the data directory of module "mymodule", do something like

std::ifstream in(IMP::mymodule::get_data_path("data_library"));

This will ensure that the code works when IMP is installed or used via the tools/imppy.sh script.

std::string IMP::statistics::get_example_path ( std::string file_name )

Each module has its own example directory, so be sure to use the version of this function in the correct module. For example to read the file example_protein.pdb located in the examples directory of the IMP::atom module, do

IMP::atom::read_pdb(IMP::atom::get_example_path("example_protein.pdb", model));

This will ensure that the code works when IMP is installed or used via the tools/imppy.sh script.

double IMP::statistics::get_quantile	(	const Histogram1D &	h,
		double	fraction
	)

Return the midpoint of the bin that best approximates the
specified quantile (passed as a fraction). That is,

passing .5 returns the median. And passing .9

Ints IMP::statistics::get_representatives	(	Embedding *	d,
		PartitionalClustering *	pc
	)

Given a clustering and an embedding, compute a representatative element for each cluster.

void show_histogram	(	HistogramD	h,
		std::string	xscale = `"linear"`,
		std::string	yscale = `"linear"`,
		Functions	curves = `Functions()`
	)

In python, you can use matplot lib, if installed, to show the contents of a histogram. At the moment, only 1D and 2D histograms are supported.

Parameters

[in]	h	The histogram to show, the plot is sized to the histograms bounding box.
[in]	xscale	Whether the xscale is "linear" or "log"
[in]	yscale	Whether the yscale is "linear" or "log"
[in]	curves	A list of python functions to plot on the histogram as curves. The functions should take one float and return a float.

void IMP::statistics::validate_partitional_clustering	(	PartitionalClustering *	pc,
		unsigned int	n
	)

Check that the clustering is a valid clustering of n elements. An exception is thrown if it is not, if the build is not a fast build.

Classes

Typedefs

Functions

Python only

Standard module methods

Typedef Documentation

Function Documentation