Probabilistic Robotics
  • Introduction
  • Basics
    • Recursive State Estimation
      • Basic Concepts in Probability
      • Robot Environment Interaction
      • Bayes Filters
    • Gaussian Filters
      • Kalman Filter
      • Extended Kalman Filter
    • Nonparametric Filters
      • Histogram Filter
      • Particle Filter
    • Robot Perception
      • Beam Models of Range Finders
      • Likelihood Fields for Range Finders
  • Localization
    • Taxonomy of Localization Problems
    • Grid and Monte Carlo
      • Monte Carlo Localization
    • Markov and Gaussian
  • Projects
    • Mislocalization Heatmap
Powered by GitBook
On this page

Was this helpful?

  1. Basics
  2. Recursive State Estimation

Basic Concepts in Probability

PreviousRecursive State EstimationNextRobot Environment Interaction

Last updated 5 years ago

Was this helpful?

Let XXX denotes a random variable, then xxx is a specific value that XXX might assume.

p(X=x)  denotes the probability of X has the value of xp(X = x) \;\text{denotes the probability of $X$ has the value of $x$}p(X=x)denotes the probability of X has the value of x

Therefore,

∑xp (X=x)=1\sum_{x} p\,(X = x) = 1x∑​p(X=x)=1

All continuous random variables possess probability density function, PDF.

p (x)=(2πσ2)−1/2exp⁡−12(x−μ)2σ2p\,(x) = (2\pi\sigma^{2})^{-1/2} \exp{\frac{-1}{2}\frac{(x - \mu)^{2}}{\sigma^{2}}}p(x)=(2πσ2)−1/2exp2−1​σ2(x−μ)2​

We can abbreviate the equation as follows, because it is a normal distribution.

N(x;μ;σ2)N(x; \mu; \sigma^{2})N(x;μ;σ2)

and

If they are independent, then

If they are conditioned, then

Theorem of Total Probability states the following.

We can apply Bayes Rule.

In integral form,

It is perfectly fine to to condition any of the rules on arbitrary random variables, e.g. the location of a robot can inferred from multiple sources of random measurements.

Similarly, we can condition the rule for combining probabilities of independent random variables on other variables.

However, conditional independence does not imply absolute independence, that is

The converse is neither true, absolute independence does not imply conditional independence.

The expected value of a random variable is given by

Expectation is a linear function of a random variable, we have the following property.

Covariance measures the squared expected deviation from the mean. Therefore, square root of covariance is in fact variance, i.e. the expected deviation from the mean.

Theorem of Total Probability states the following.

We can apply Bayes Rule.

In integral form,

It is perfectly fine to to condition any of the rules on arbitrary random variables, e.g. the location of a robot can inferred from multiple sources of random measurements.

Similarly, we can condition the rule for combining probabilities of independent random variables on other variables.

However, conditional independence does not imply absolute independence, that is

The converse is neither true, absolute independence does not imply conditional independence.

The expected value of a random variable is given by

Expectation is a linear function of a random variable, we have the following property.

Covariance measures the squared expected deviation from the mean. Therefore, square root of covariance is in fact variance, i.e. the expected deviation from the mean.

However, in general, xxx is not a scalar value, it is generally a vector. Let Σ\SigmaΣ be a positive semi-definite and symmetric matrix, which is a covariance matrix.

p (x)=det(2πΣ)−1/2exp⁡−12(x⃗−μ⃗)TΣ−1(x⃗−μ⃗)p\,(x) = det(2\pi\Sigma)^{-1/2} \exp{ \frac{-1}{2} (\vec{x} - \vec{\mu})^{T} \Sigma^{-1} (\vec{x} - \vec{\mu})}p(x)=det(2πΣ)−1/2exp2−1​(x−μ​)TΣ−1(x−μ​)
∫p (x) dx=1\int p\,(x)\,dx = 1∫p(x)dx=1

The joint distribution of two random variables XXX and YYY can be described as follows.

p (x,y)=p (X=x and Y=y)p\,(x, y) = p\,(\text{$X = x$ and $Y = y$})p(x,y)=p(X=x and Y=y)
p (x,y)=p(x)p(y)p\,(x,y) = p(x)p(y)p(x,y)=p(x)p(y)
p (x∣y)=p (X=x∣Y=y)p\,(x \mid y) = p\,(X=x \mid Y=y)p(x∣y)=p(X=x∣Y=y)

If p(y)>0p(y) > 0p(y)>0, then

p (x∣y)=p (x,y)p(y)p\,(x \mid y) = \frac{p\,(x, y)}{p(y)}p(x∣y)=p(y)p(x,y)​
p (x)=∑yp (x∣y)p(y)=∫p (x∣y)p(y)dyp\,(x) = \sum_{y} p\,(x \mid y)p(y) = \int p\,(x \mid y)p(y)dyp(x)=y∑​p(x∣y)p(y)=∫p(x∣y)p(y)dy
p (x∣y)=p (y∣x)p(x)p(y)=p (y∣x)p(x)∑x′p (y∣x′)p(x′)p\,(x \mid y) = \frac{ p\,(y \mid x) p(x) }{ p(y) } = \frac{ p\,(y \mid x) p(x) } { \sum_{x^\prime} p\,(y \mid x^\prime) p(x^\prime)}p(x∣y)=p(y)p(y∣x)p(x)​=∑x′​p(y∣x′)p(x′)p(y∣x)p(x)​
p (y∣x)p(x)∫p (y∣x′)p(x′)dx′\frac{ p\,(y \mid x) p(x) } { \int p\,(y \mid x^\prime) p(x^\prime) dx^\prime}∫p(y∣x′)p(x′)dx′p(y∣x)p(x)​

If xxx is a quantity that we would like to infer from yyy, the probability p(x)p(x)p(x) is referred as prior probability distribution and yyy is called data, e.g. laser measurements. p(x∣y)p(x \mid y)p(x∣y) is called posterior probability distribution over XXX.

In robotics, p(y∣x)p(y \mid x)p(y∣x) is called generative model. Since p(y)p(y)p(y) does not depend on xxx, p(y)−1p(y)^{-1}p(y)−1 is often written as a normalizer in Bayes rule variables.

p (x∣y)=ηp (y∣x)p(x)p\,(x \mid y) = \eta p\,(y \mid x) p(x)p(x∣y)=ηp(y∣x)p(x)
p (x∣y,z)=p (x∣x,z)p (y∣z)p (y∣z)p\,(x \mid y, z) = \frac{ p\,(x \mid x, z) p\,(y \mid z) }{ p\,(y \mid z) }p(x∣y,z)=p(y∣z)p(x∣x,z)p(y∣z)​

for as long as p (y∣z)>0p\,(y \mid z) > 0p(y∣z)>0.

p (x,y∣z)=p (x∣z)p (y∣z)p\,(x, y \mid z) = p\,(x \mid z)p\,(y \mid z)p(x,y∣z)=p(x∣z)p(y∣z)
p (x,y∣z)=p (x∣z)p (y∣z)≠p(x,y)=p(x)p(y)p\,(x, y \mid z) = p\,(x \mid z)p\,(y \mid z) \neq p(x,y) = p(x)p(y)p(x,y∣z)=p(x∣z)p(y∣z)=p(x,y)=p(x)p(y)
E[X]=∑xxp(x)=∫xp(x)dxE[X] = \sum_{x} x p(x) = \int x p(x) dxE[X]=x∑​xp(x)=∫xp(x)dx
E[aX+b]=aE[X]+bE[aX + b] = aE[X] + bE[aX+b]=aE[X]+b
Cov[X]=E[X−E[X]2=E[X2]−E[X]2Cov[X] = E[X - E[X]^{2} = E[X^{2}] - E[X]^2Cov[X]=E[X−E[X]2=E[X2]−E[X]2

Finally, entropy of a probability distribution is given by the following expression. Entropy is the expected information that the value of xxx carries.

Hp(x)=−∑xp(x)log2p(x)=−∫p(x)log2p(x)dxH_{p}(x) = -\sum_{x} p(x) log_{2}p(x) = -\int p(x) log_{2} p(x) dxHp​(x)=−x∑​p(x)log2​p(x)=−∫p(x)log2​p(x)dx

In the discrete case, the −log2p(x)-log_{2}p(x)−log2​p(x) is the number of bits required to encode x using an optimal encoding, assuming that p(x)p(x)p(x) is the probability of observing xxx .

p(x)=∑yp(x∣y)p(y)=∫p(x∣y)p(y)dyp(x) = \sum_{y} p(x \mid y)p(y) = \int p(x \mid y)p(y)dyp(x)=y∑​p(x∣y)p(y)=∫p(x∣y)p(y)dy
p(x∣y)=p(y∣x)p(x)p(y)=p(y∣x)p(x)∑x‘p(y∣x‘)p(x‘)p(x \mid y) = \frac{ p(y \mid x) p(x) }{ p(y) } = \frac{ p(y \mid x) p(x) } { \sum_{x`} p(y \mid x`) p(x`)}p(x∣y)=p(y)p(y∣x)p(x)​=∑x‘​p(y∣x‘)p(x‘)p(y∣x)p(x)​
p(y∣x)p(x)∫p(y∣x′)p(x′)dx′\frac{ p(y \mid x) p(x) } { \int p(y \mid x^\prime) p(x^\prime) dx^\prime}∫p(y∣x′)p(x′)dx′p(y∣x)p(x)​

If xxx is a quantity that we would like to infer from yyy, the probability p(x)p(x)p(x) is referred as prior probability distribution and yyy is called data, e.g. laser measurements. p(x∣y)p(x \mid y)p(x∣y) is called posterior probability distribution over XXX.

In robotics, p(y∣x)p(y \mid x)p(y∣x) is called generative model. Since p(y)p(y)p(y) does not depend on xxx, p(y)−1p(y)^{-1}p(y)−1 is often written as a normalizer in Bayes rule variables.

p (x∣y)=ηp(y∣x)p(x)p\,(x \mid y) = \eta p(y \mid x) p(x)p(x∣y)=ηp(y∣x)p(x)
p (x∣y,z)=p(x∣x,z)p(y∣z)p(y∣z)p\,(x \mid y, z) = \frac{ p(x \mid x, z) p(y \mid z) }{ p(y \mid z) }p(x∣y,z)=p(y∣z)p(x∣x,z)p(y∣z)​

for as long as p (y∣z)>0p\,(y \mid z) > 0p(y∣z)>0.

p (x,y∣z)=p(x∣z)p(y∣z)p\,(x, y \mid z) = p(x \mid z)p(y \mid z)p(x,y∣z)=p(x∣z)p(y∣z)
p (x,y∣z)=p(x∣z)p(y∣z)≠p(x,y)=p(x)p(y)p\,(x, y \mid z) = p(x \mid z)p(y \mid z) \neq p(x,y) = p(x)p(y)p(x,y∣z)=p(x∣z)p(y∣z)=p(x,y)=p(x)p(y)
E[X]=∑xxp(x)=∫xp(x)dxE[X] = \sum_{x} x p(x) = \int x p(x) dxE[X]=x∑​xp(x)=∫xp(x)dx
E[aX+b]=aE[X]+bE[aX + b] = aE[X] + bE[aX+b]=aE[X]+b
Cov[X]=E[X−E[X]2=E[X2]−E[X]2Cov[X] = E[X - E[X]^{2} = E[X^{2}] - E[X]^2Cov[X]=E[X−E[X]2=E[X2]−E[X]2

Finally, entropy of a probability distribution is given by the following expression. Entropy is the expected information that the value of xxx carries.

Hp(x)=−∑xp(x)log2p(x)=−∫p(x)log2p(x)dxH_{p}(x) = -\sum_{x} p(x) log_{2}p(x) = -\int p(x) log_{2} p(x) dxHp​(x)=−x∑​p(x)log2​p(x)=−∫p(x)log2​p(x)dx

In the discrete case, the −log2p(x)-log_{2}p(x)−log2​p(x) is the number of bits required to encode x using an optimal encoding, assuming that p(x)p(x)p(x) is the probability of observing xxx.