Let X denotes a random variable, then x is a specific value that X might assume.
p(X=x)denotes the probability of X has the value of x
Therefore,
x∑p(X=x)=1
All continuous random variables possess probability density function, PDF.
p(x)=(2πσ2)−1/2exp2−1σ2(x−μ)2
We can abbreviate the equation as follows, because it is a normal distribution.
N(x;μ;σ2)
However, in general, x is not a scalar value, it is generally a vector. Let Σ be a positive semi-definite and symmetric matrix, which is a covariance matrix.
p(x)=det(2πΣ)−1/2exp2−1(x−μ)TΣ−1(x−μ)
and
∫p(x)dx=1
The joint distribution of two random variables X and Y can be described as follows.
p(x,y)=p(X=x and Y=y)
If they are independent, then
p(x,y)=p(x)p(y)
If they are conditioned, then
p(x∣y)=p(X=x∣Y=y)
If p(y)>0, then
p(x∣y)=p(y)p(x,y)
Theorem of Total Probability states the following.
If x is a quantity that we would like to infer from y, the probability p(x) is referred as prior probability distribution and y is called data, e.g. laser measurements. p(x∣y) is called posterior probability distribution over X.
In robotics, p(y∣x) is called generative model. Since p(y) does not depend on x, p(y)−1 is often written as a normalizer in Bayes rule variables.
p(x∣y)=ηp(y∣x)p(x)
It is perfectly fine to to condition any of the rules on arbitrary random variables, e.g. the location of a robot can inferred from multiple sources of random measurements.
p(x∣y,z)=p(y∣z)p(x∣x,z)p(y∣z)
for as long as p(y∣z)>0.
Similarly, we can condition the rule for combining probabilities of independent random variables on other variables.
p(x,y∣z)=p(x∣z)p(y∣z)
However, conditional independence does not imply absolute independence, that is
p(x,y∣z)=p(x∣z)p(y∣z)=p(x,y)=p(x)p(y)
The converse is neither true, absolute independence does not imply conditional independence.
The expected value of a random variable is given by
E[X]=x∑xp(x)=∫xp(x)dx
Expectation is a linear function of a random variable, we have the following property.
E[aX+b]=aE[X]+b
Covariance measures the squared expected deviation from the mean. Therefore, square root of covariance is in fact variance, i.e. the expected deviation from the mean.
Cov[X]=E[X−E[X]2=E[X2]−E[X]2
Finally, entropy of a probability distribution is given by the following expression. Entropy is the expected information that the value of x carries.
Hp(x)=−x∑p(x)log2p(x)=−∫p(x)log2p(x)dx
In the discrete case, the −log2p(x) is the number of bits required to encode x using an optimal encoding, assuming that p(x) is the probability of observing x .
Theorem of Total Probability states the following.
If x is a quantity that we would like to infer from y, the probability p(x) is referred as prior probability distribution and y is called data, e.g. laser measurements. p(x∣y) is called posterior probability distribution over X.
In robotics, p(y∣x) is called generative model. Since p(y) does not depend on x, p(y)−1 is often written as a normalizer in Bayes rule variables.
p(x∣y)=ηp(y∣x)p(x)
It is perfectly fine to to condition any of the rules on arbitrary random variables, e.g. the location of a robot can inferred from multiple sources of random measurements.
p(x∣y,z)=p(y∣z)p(x∣x,z)p(y∣z)
for as long as p(y∣z)>0.
Similarly, we can condition the rule for combining probabilities of independent random variables on other variables.
p(x,y∣z)=p(x∣z)p(y∣z)
However, conditional independence does not imply absolute independence, that is
p(x,y∣z)=p(x∣z)p(y∣z)=p(x,y)=p(x)p(y)
The converse is neither true, absolute independence does not imply conditional independence.
The expected value of a random variable is given by
E[X]=x∑xp(x)=∫xp(x)dx
Expectation is a linear function of a random variable, we have the following property.
E[aX+b]=aE[X]+b
Covariance measures the squared expected deviation from the mean. Therefore, square root of covariance is in fact variance, i.e. the expected deviation from the mean.
Cov[X]=E[X−E[X]2=E[X2]−E[X]2
Finally, entropy of a probability distribution is given by the following expression. Entropy is the expected information that the value of x carries.
Hp(x)=−x∑p(x)log2p(x)=−∫p(x)log2p(x)dx
In the discrete case, the −log2p(x) is the number of bits required to encode x using an optimal encoding, assuming that p(x) is the probability of observing x.