next up previous
Next: Simulation Results Up: cDNA Clustering Previous: The low degree heuristic

   
Assessing Clustering Quality

A measure for the quality of a solution given the true clustering should be devised. One can describe a clustering of n elements by an $n \times n$ symmetric (0,1) matrix C, where Cij=1 iff i and j belong the same cluster. Given matrix representations of the true clustering T and any clustering C of the same data set, the Minkowski measure for the quality of C is the normalized L2 distance between the two matrices

\begin{displaymath}D_M(T,C)=\frac{\sqrt{\sum_{i,j}(T_{ij}-C_{ij})^2}}{\Vert T\Vert}
\end{displaymath}

An alternative is the all pairs measure:

\begin{displaymath}D_{ap}(T,C)=\frac{\mid\{(i,j)\mid T_{ij}=C_{ij}=1\}\mid-\mid\...
...d T_{ij}
\neq C_{ij}\}\mid}{\mid \{(i,j)\mid T_{ij}=1\}\mid}
\end{displaymath}



Itshack Pe`er
1999-03-16