KL Divergence
Last updated
Last updated
We want to quantify the difference between probability distributions for a given random variable. For example, we'd be interested in the difference between an actual and observed probability distribution.
Kullback-Leibler divergence calculates a score that measures the divergence of one probability distribution from another.
The KL divergence between two distributions Q
and P
is often stated as follows.
The operator indicates P
's divergence from Q
, which is not the same as Q
's divergence from P
. Thus, the operation is not commutative.
KL divergence can be calculated as the negative sum of probability of each event in P
multiplied by the log of the probability of the event in Q
over the probability of the event in P
.
If there is no divergence at all, the log term becomes 0 and the sum is 0. The log can be base-2 to give units in bits.
If we are attempting to approximate an unknown probability distribution, then the target probability distribution from data is P and Q is our approximation of the distribution. In this case, the KL divergence summarizes the number of additional bits required to represent an event from the random variable. The better our approximation, the less additional information is required.
Consider a random variable like drawing a colored marble from a bag. We have 3 colors and two different probability distribution.
We can calculate the KL divergence using formula above.
Intuitively Understanding the KL Divergence
If I have two coins, one fair and one weighted, KL divergence tells me how much information do I need to distinguish the weighted coin from the fair coin.