next up previous
Next: Refinements Up: The HCC Algorithm Previous: Clustering

   
Properties of HCC clustering

Theorem 12.5   The diameter of each cluster is smaller than or equal to 2. That is, the distance between two vertices is at most 2.

Proof:Let k(G) be the size of the minimum cut, let d(v) denote the degree of v and $\delta(G)=\min_v d(v)$. Observe that $k(G) \leq \delta(G)$: suppose to the contracy that d(v)<k(G) for some v, then the cut $(\{v\}, V \setminus
\{v\})$ contradicts minimality of k(G). Furthermore, if G is a cluster reported by the HCC algorithm, then according to the ``if'' statement in the algorithm $\frac{\vert V\vert}{2} \leq k(G)$ (otherwise G would have been divided in two by HCC). Therefore, $\delta(G) > \frac{\vert V\vert}{2}$. Consider two vertices v1 and v2 in G. If they are neighbors, then surely the theorem holds for them. Let us therefore assume that they are not neighbors. From the previous inequality, each one of these vertices has more than $\frac{\vert V\vert}{2}$ neighbors in G. Therefore, they must have a common neighbor, since the total number of vertices in the graph is |V|, and therefore the total number of their neighbors cannot exceed |V|-2. While we have proven that each highly connected cluster has a small diameter, the converse does not necessarily hold. That is, G may have a subgraph, with diameter 2 that is not a highly connected component.

Lemma 12.6   Let S be a set of edges forming a minimum cut in the graph G=(V,E). Let H and $\bar{H}$ be the induced subgraphs obtained by removing S from G, where $\vert V(\bar{H})\vert \leq \vert V(H)\vert$. If $\vert V(\bar{H})\vert > 1$ then $\vert S\vert \leq
\vert V(\bar{H})\vert$, with equality only if $\bar{H}$ is a clique.

The lemma implies that if a minimum cut S in G=(V,E) satisfies $\vert S\vert>\frac{\vert V\vert}{2}$ then S splits the graph into a single vertex $\{v\}$ and $G \setminus \{v\}$. This shows us that using a stronger stopping criterion for the algorithm, i.e., $\vert S\vert>\alpha$, for $\alpha>\frac{\vert V\vert}{2}$ will be detrimental for clustering: Any cut of value $x > \frac{\vert V\vert}{2}$ separates only a singleton from the current graph.

Theorem 12.7   Let S be a minimum cut in the graph G=(V,E) where $\vert S\vert \leq
\frac{\vert V\vert}{2}$. Let H and $\bar{H}$ be the connected induced subgraphs obtained by removing S from G, where $\vert V(\bar{H})\vert \leq \vert V(H)\vert$. If $diam(G) \leq 2$ then (1) every vertex in $\bar{H}$ is incident on S, (2) $\bar{H}$ is a clique.

It can be shown, using this theorem, that the union of two vertex sets split by any step of HCC is unlikely to induce a graph with diameter $\leq 2$ if noise is random, and the vertex sets are not too small. Another property of the solution is given by:

Theorem 12.8   1. The number of edges in a highly connected subgraph is quadratic.
2. The number of edges removed by each iteration of the HCC algorithm is at most linear.

Proof:Let n be the number of edges in the graph. Then: 1. As we have seen before, $\frac{n}{2} < k(G) \leq \delta (G)$. Since the rank of each vertex is $> \frac{n}{2}$, the total number of edges is

\begin{displaymath}N = \frac{1}{2} \sum_{v} \delta (v) > \frac{1}{2} \sum_{i = 1}^{n} \frac{n}{2}
= \frac{n^2}{4}
\end{displaymath}

2. The algorithm removes the edges forming the minimal cut S, only if $\vert S\vert <
\frac{n}{2}$. Therefore, obviously the number of removed edges is linear.
next up previous
Next: Refinements Up: The HCC Algorithm Previous: Clustering
Itshack Pe`er
1999-03-16