[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [IMP-dev] clustering - code consolidation



good points. Here is the proposed interface.Â

class Clusterer{

 // use explicit vector embedding to initialize the data
 Clusterer(Embedding data, params);

 // use implicit conversion of data from e.g. FloatsList to data
 Clusterer(EmbeddingAdaptor data, params);

 // run the clustering on the stored data
 virtual PartitioningClusteringResults do_clustering() = 0;
};

class XXXXIncrementalClusterer : Clusterer{
 // use explicit vector embedding to initialize the data
 XXXXIncrementalClusterer(Embedding data, params);

 // use implicit conversion of data from e.g. FloatsList to data
 XXXXIncremenralClusterer(EmbeddingAdaptor data, params);

 //Â
 PartitioningClusteringResults add_data(XXXX data);
};

On Thu, Jun 14, 2012 at 1:34 PM, Daniel Russel <" target="_blank">> wrote:
Sorry, I didn't see this before. I think my previous comments still stand. As a couple additional ones:
- having "execute" style methods on objects isn't a very nice practice. It destroys type safety, since for class A you really have two different types of A, the pre-execute A and the post-execute A and some contexts require one and some the other (and the compiler can't check). And doesn't really give you anything that you can't get from producing a new object as the result. Ideally, classes should have what I have seen called the "no protocols" property: you can call any function of the class in any order.

- I kind of prefer Clusterer to Clustering when referring to the algorithm as the latter very much means the result of running a clustering algorithm to me as opposed to something that does clustering. But that may be my eccentricity (but if you google "clusterer" the results seem consistent with that usage). In any case, we need to make sure that distinct terms are used for distinct things.


Now the full version... Â
Daniel and I discussed a little bit consolidation of clustering things in statistics and kmeans modules. Please tell me if it is agreed that things will work with the following interface:
* The Embedding family of classes (used to embed data in vector form) will remain as is
* There will be a "Clustering" class from which all clustering algorithms will derive, with aÂconstructor that takes either Embedding class, or EmbeddingAdaptor for implicit conversions from e.g., FloatsList
* The Clustering classes will also have a void ::execute() method and a ::get_clustering results() method, that will return the clustering results (using the exisiting PartitioningClustering class, perhaps we can change its name to PartitioningClusteringResults).

So bottom line, if you will want to cluster some data, you will do something like
FloatsList data; // or create an Embedding object
KMeansClustering kmeans(data, params);
kmeans.execute();
PartitioningClustering clustering_results = kmeans.get_clustering_results()

Makes sense? For backward compatibility, I will add a DEPRECATED warning to existing clustering methods, so they will be removed within a few months completely.

Barak

_______________________________________________
IMP-dev mailing list
" target="_blank">
https://salilab.org/mailman/listinfo/imp-dev





--
Barak