pam(x, k, diss=F, metric="euclidean", stand=F)
x
|
data matrix or dataframe, or dissimilarity matrix, depending on the
value of the diss argument.
In case of a matrix or dataframe, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed.
In case of a dissimilarity matrix,
|
k
|
integer, the number of clusters.
|
diss
|
logical flag: if TRUE, then x will be considered as a dissimilarity
matrix. If FALSE, then x will be considered as a matrix of observations
by variables.
|
metric
|
character string specifying the metric to be used for calculating
dissimilarities between objects.
The currently available options are "euclidean" and "manhattan".
Euclidean distances are root sum-of-squares of differences, and
manhattan distances are the sum of absolute differences.
If x is already a dissimilarity matrix, then this argument
will be ignored.
|
stand
|
logical flag: if TRUE, then the measurements in x are standardized before
calculating the dissimilarities. Measurements are standardized for each
variable (column), by subtracting the variable's mean value and dividing by
the variable's mean absolute deviation.
If x is already a dissimilarity matrix, then this argument
will be ignored.
|
pam
is fully described in chapter 2 of Kaufman and Rousseeuw (1990).
Compared to the k-means approach in kmeans
, the function pam
has
the following features: (a) it also accepts a dissimilarity matrix;
(b) it is more robust because it minimizes a sum of dissimilarities
instead of a sum of squared euclidean distances; (c) it provides a novel
graphical display, the silhouette plot (see plot.partition
)
which also allows to select the number of clusters.
The pam
-algorithm is based on the search for k
representative objects or
medoids among the objects of the dataset. These objects should represent
the structure of the data. After finding a set of k
medoids, k
clusters
are constructed by assigning each object to the nearest medoid.
The goal is to find k
representative objects which minimize the sum of
the dissimilarities of the objects to their closest representative object.
The algorithm first looks for a good initial set of medoids (this is called
the BUILD phase). Then it finds a local minimum for the objective function,
that is, a solution such that there is no single switch of an object with
a medoid that will decrease the objective (this is called the SWAP phase).
"pam"
representing the clustering.
See pam.object for details.
pam
, clara
, and
fanny
require that the number of clusters be given by the user.
Hierarchical methods like agnes
, diana
, and mona
construct a
hierarchy of clusterings, with the number of clusters ranging from one to
the number of objects.
pam
will take a lot of
computation time. Then the function clara
is preferable.
pam.object
, partition.object
, daisy
, dist
, clara
,
plot.partition
.
#generate 25 objects, divided into 2 clusters. x <- rbind(cbind(rnorm(10,0,0.5),rnorm(10,0,0.5)), cbind(rnorm(15,5,0.5),rnorm(15,5,0.5))) pamx <- pam(x, 2) pamx summary(pamx) plot(pamx) pam(daisy(x, metric="manhattan"), 2, diss=T)