Linde-Buzo-Gray (LBG) Algorithm

Printer-friendly versionPrinter-friendly version

This is an algorithm developed in the community of vector quantization for the purpose of data compression. Here's the article where this algorithm is developed.

Y. Linde, A. Buzo and R. M. Gray, ”An algorithm for vector quantizer design,” IEEE Trans. on Communication, Vol. COM-28, pp. 84-95, Jan. 1980.

As we had mentioned before k-means is known as the Lloyd by electrical. Linde, Buzo and Gray  extended Lloyd's algorithm by a careful approach to initialization and often achieved better performance in terms of minimizing the total within class distance.

Here is how this algorithm works:

Step 1. First, you find the sample mean or what we call the centroid z1(1) for the entire data set. This is like if we were to have only one prototype. The sample mean is proven to minimize the total within class distance (total mean square distortion) for a single prototype.

Step 2. Set k = 1, l = 1. l is the index for the iteration. k counts the number of prototypes that have been generated. After the first step, we've only generated one prototype.

Step 3. If k < M, (M being the target number of centroids that we're trying to get), split the current centroids by adding small offsets. Let's see how this is done.

Since if we already have k prototypes, we need Mk additional prototypes.

  • If Mkk, split all the existing centroids that have been created so far; otherwise, we split only Mk of them.

We will denote the number of centroids which are split by k_tilde = min(k, Mk), where k_tilde is the smaller of the two values.

How do we make this split?

For example, to split z1(1) into two centroids, let z1(2) = z1(1), z2(2) = z1(1) + ε, where ε is a small offset and has a small norm and a random direction.

Step 4. k k + k_tilde ; l ← l + 1. The splitting has taken place.

Step 5. Use {z1(1), z2(1), ... , zk(1)} as initial prototypes, which includes the previously generated centroids and the newly split centroids. Apply k-means to refine these prototypes.

Step 6. Check whether the number of prototypes has reached the target number of prototypes. In other words, if k < M, go back to step 3; otherwise, stop.