-
Part 1
The initial four cluster partition {c1, c2, c3, c4} for the text collection
is provided by this link
text vs cluster. Assuming that
the ground-truth partition is given by
-
cacm texts belong to cluster1
-
cisi texts belong to cluster2
-
cran texts belong to cluster3
-
med texts belong to cluster4
build the confusion matrix (CM) for the partition provided.
That is build a 4X4 matrix whose row i shows distribution
of ci elements
among
cluster1, cluster2, cluster3, and cluster4.
For example the first row is computed as follows:
CM11=|c1∩cluster1|,
CM12=|c1∩cluster2|,
CM13=|c1∩cluster3|, and
CM14=|c1∩cluster4|
-
Part 2
Apply batch k-means to the partition
{c1, c2, c3, c4}. Call the obtained partition
{c1', c2', c3', c4'}.
-
Part 3
Build a the confusion matrix for the partition
{c1', c2', c3', c4'}.
OUTPUT: initial and final confusion matrices [Please
upload a single file.]