MLPACK  1.0.4
Public Member Functions | Private Attributes
mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy > Class Template Reference

This class implements K-Means clustering. More...

List of all members.

Public Member Functions

 KMeans (const size_t maxIterations=1000, const double overclusteringFactor=1.0, const DistanceMetric metric=DistanceMetric(), const InitialPartitionPolicy partitioner=InitialPartitionPolicy(), const EmptyClusterPolicy emptyClusterAction=EmptyClusterPolicy())
 Create a K-Means object and (optionally) set the parameters which K-Means will be run with.
template<typename MatType >
void Cluster (const MatType &data, const size_t clusters, arma::Col< size_t > &assignments) const
 Perform K-Means clustering on the data, returning a list of cluster assignments.
const EmptyClusterPolicy & EmptyClusterAction () const
 Get the empty cluster policy.
EmptyClusterPolicy & EmptyClusterAction ()
 Modify the empty cluster policy.
template<typename MatType >
void FastCluster (MatType &data, const size_t clusters, arma::Col< size_t > &assignments) const
size_t MaxIterations () const
 Get the maximum number of iterations.
void MaxIterations (const size_t maxIterations)
 Set the maximum number of iterations.
const DistanceMetric & Metric () const
 Get the distance metric.
DistanceMetric & Metric ()
 Modify the distance metric.
double OverclusteringFactor () const
 Return the overclustering factor.
void OverclusteringFactor (const double overclusteringFactor)
 Set the overclustering factor.
const InitialPartitionPolicy & Partitioner () const
 Get the initial partitioning policy.
InitialPartitionPolicy & Partitioner ()
 Modify the initial partitioning policy.

Private Attributes

EmptyClusterPolicy emptyClusterAction
 Instantiated empty cluster policy.
size_t maxIterations
 Maximum number of iterations before giving up.
DistanceMetric metric
 Instantiated distance metric.
double overclusteringFactor
 Factor controlling how many clusters are actually found.
InitialPartitionPolicy partitioner
 Instantiated initial partitioning policy.

Detailed Description

template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
class mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >

This class implements K-Means clustering.

This implementation supports overclustering, which means that more clusters than are requested will be found; then, those clusters will be merged together to produce the desired number of clusters.

Two template parameters can (optionally) be supplied: the policy for how to find the initial partition of the data, and the actions to be taken when an empty cluster is encountered, as well as the distance metric to be used.

A simple example of how to run K-Means clustering is shown below.

 extern arma::mat data; // Dataset we want to run K-Means on.
 arma::Col<size_t> assignments; // Cluster assignments.

 KMeans<> k(); // Default options.
 k.Cluster(data, 3, assignments); // 3 clusters.

 // Cluster using the Manhattan distance, 100 iterations maximum, and an
 // overclustering factor of 4.0.
 KMeans<metric::ManhattanDistance> k(100, 4.0);
 k.Cluster(data, 6, assignments); // 6 clusters.
Template Parameters:
DistanceMetricThe distance metric to use for this KMeans; see metric::LMetric for an example.
InitialPartitionPolicyInitial partitioning policy; must implement a default constructor and 'void Cluster(const arma::mat&, const size_t, arma::Col<size_t>&)'.
See also:
RandomPartition for an example.
Template Parameters:
EmptyClusterPolicyPolicy for what to do on an empty cluster; must implement a default constructor and 'void EmptyCluster(const arma::mat&, arma::Col<size_t&)'.
See also:
AllowEmptyClusters and MaxVarianceNewCluster.

Definition at line 73 of file kmeans.hpp.


Constructor & Destructor Documentation

template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::KMeans ( const size_t  maxIterations = 1000,
const double  overclusteringFactor = 1.0,
const DistanceMetric  metric = DistanceMetric(),
const InitialPartitionPolicy  partitioner = InitialPartitionPolicy(),
const EmptyClusterPolicy  emptyClusterAction = EmptyClusterPolicy() 
)

Create a K-Means object and (optionally) set the parameters which K-Means will be run with.

This implementation allows a few strategies to improve the performance of K-Means, including "overclustering" and disallowing empty clusters.

The overclustering factor controls how many clusters are actually found; for instance, with an overclustering factor of 4, if K-Means is run to find 3 clusters, it will actually find 12, then merge the nearest clusters until only 3 are left.

Parameters:
maxIterationsMaximum number of iterations allowed before giving up (0 is valid, but the algorithm may never terminate).
overclusteringFactorFactor controlling how many extra clusters are found and then merged to get the desired number of clusters.
metricOptional DistanceMetric object; for when the metric has state it needs to store.
partitionerOptional InitialPartitionPolicy object; for when a specially initialized partitioning policy is required.
emptyClusterActionOptional EmptyClusterPolicy object; for when a specially initialized empty cluster policy is required.

Member Function Documentation

template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
template<typename MatType >
void mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::Cluster ( const MatType &  data,
const size_t  clusters,
arma::Col< size_t > &  assignments 
) const

Perform K-Means clustering on the data, returning a list of cluster assignments.

Optionally, the vector of assignments can be set to an initial guess of the cluster assignments; to do this, the number of elements in the list of assignments must be equal to the number of points (columns) in the dataset.

Template Parameters:
MatTypeType of matrix (arma::mat or arma::spmat).
Parameters:
dataDataset to cluster.
clustersNumber of clusters to compute.
assignmentsVector to store cluster assignments in. Can contain an initial guess at cluster assignments.
template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
const EmptyClusterPolicy& mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::EmptyClusterAction ( ) const [inline]
template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
EmptyClusterPolicy& mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::EmptyClusterAction ( ) [inline]
template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
template<typename MatType >
void mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::FastCluster ( MatType &  data,
const size_t  clusters,
arma::Col< size_t > &  assignments 
) const
template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
size_t mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::MaxIterations ( ) const [inline]

Get the maximum number of iterations.

Definition at line 150 of file kmeans.hpp.

References mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::maxIterations.

template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
void mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::MaxIterations ( const size_t  maxIterations) [inline]

Set the maximum number of iterations.

Definition at line 155 of file kmeans.hpp.

References mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::maxIterations.

template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
const DistanceMetric& mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::Metric ( ) const [inline]
template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
DistanceMetric& mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::Metric ( ) [inline]

Modify the distance metric.

Definition at line 163 of file kmeans.hpp.

References mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::metric.

template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
double mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::OverclusteringFactor ( ) const [inline]
template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
void mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::OverclusteringFactor ( const double  overclusteringFactor) [inline]
template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
const InitialPartitionPolicy& mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::Partitioner ( ) const [inline]

Get the initial partitioning policy.

Definition at line 166 of file kmeans.hpp.

References mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::partitioner.

template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
InitialPartitionPolicy& mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::Partitioner ( ) [inline]

Modify the initial partitioning policy.

Definition at line 168 of file kmeans.hpp.

References mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::partitioner.


Member Data Documentation

template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
EmptyClusterPolicy mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::emptyClusterAction [private]

Instantiated empty cluster policy.

Definition at line 188 of file kmeans.hpp.

Referenced by mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::EmptyClusterAction().

template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
size_t mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::maxIterations [private]

Maximum number of iterations before giving up.

Definition at line 182 of file kmeans.hpp.

Referenced by mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::MaxIterations().

template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
DistanceMetric mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::metric [private]

Instantiated distance metric.

Definition at line 184 of file kmeans.hpp.

Referenced by mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::Metric().

template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
double mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::overclusteringFactor [private]

Factor controlling how many clusters are actually found.

Definition at line 180 of file kmeans.hpp.

Referenced by mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::OverclusteringFactor().

template<typename DistanceMetric = metric::SquaredEuclideanDistance, typename InitialPartitionPolicy = RandomPartition, typename EmptyClusterPolicy = MaxVarianceNewCluster>
InitialPartitionPolicy mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::partitioner [private]

Instantiated initial partitioning policy.

Definition at line 186 of file kmeans.hpp.

Referenced by mlpack::kmeans::KMeans< DistanceMetric, InitialPartitionPolicy, EmptyClusterPolicy >::Partitioner().


The documentation for this class was generated from the following file: