Originally shared by Yonatan Zunger
It turns out that there’s an entire mathematics behind how briefly you can express an idea. It’s possible to measure a quantity about a transmission – its information – and prove that no matter what, you can never express it in fewer than that many bits. This idea turns out to have powerful applications in everything from computer science to gambling, but it can be hard to wrap your head around.
Google intern Christopher Olah has taken a serious stab at giving a really clear, and primarily pictorial, explanation. There’s a lot of stuff in here, but it’s a great introduction to some amazingly important ideas.
One thing to help clarify: there’s a bit of common notation in probability that he doesn’t explain.
P(X) means “the probability of X.” (So “P(sunny)” might mean “the probability that it is sunny”)
P(X, Y) means “the probability of both X and Y.” (So “P(t-shirt, sunny)” means “the probability that I am wearing a t-shirt, and it is sunny”)
P(X | Y) means “the probability of X, given Y.” (So “P(t-shirt | sunny)” means “the probability that I am wearing a t-shirt when it is sunny”)
These aren’t the same thing, but they are related: the probability that I am wearing a t-shirt, and it’s sunny, is the same as the probability that it’s sunny, times the probability that I’m wearing a t-shirt given that it’s sunny. This is what’s called Bayes’ Theorem:
P(X, Y) = P(X | Y) * P(Y).
It shows up all the time when you’re thinking about probability and information.