Probabilistic Robotics
  • Introduction
  • Basics
    • Recursive State Estimation
      • Basic Concepts in Probability
      • Robot Environment Interaction
      • Bayes Filters
    • Gaussian Filters
      • Kalman Filter
      • Extended Kalman Filter
    • Nonparametric Filters
      • Histogram Filter
      • Particle Filter
    • Robot Perception
      • Beam Models of Range Finders
      • Likelihood Fields for Range Finders
  • Localization
    • Taxonomy of Localization Problems
    • Grid and Monte Carlo
      • Monte Carlo Localization
    • Markov and Gaussian
  • Projects
    • Mislocalization Heatmap
Powered by GitBook
On this page

Was this helpful?

  1. Basics
  2. Recursive State Estimation

Basic Concepts in Probability

Let XXX denotes a random variable, then xxx is a specific value that XXX might assume.

p(X=x)  denotes the probability of X has the value of xp(X = x) \;\text{denotes the probability of $X$ has the value of $x$}p(X=x)denotes the probability of X has the value of x

Therefore,

∑xp (X=x)=1\sum_{x} p\,(X = x) = 1x∑​p(X=x)=1

All continuous random variables possess probability density function, PDF.

p (x)=(2πσ2)−1/2exp⁡−12(x−μ)2σ2p\,(x) = (2\pi\sigma^{2})^{-1/2} \exp{\frac{-1}{2}\frac{(x - \mu)^{2}}{\sigma^{2}}}p(x)=(2πσ2)−1/2exp2−1​σ2(x−μ)2​

We can abbreviate the equation as follows, because it is a normal distribution.

N(x;μ;σ2)N(x; \mu; \sigma^{2})N(x;μ;σ2)

However, in general, xxx is not a scalar value, it is generally a vector. Let Σ\SigmaΣ be a positive semi-definite and symmetric matrix, which is a covariance matrix.

p (x)=det(2πΣ)−1/2exp⁡−12(x⃗−μ⃗)TΣ−1(x⃗−μ⃗)p\,(x) = det(2\pi\Sigma)^{-1/2} \exp{ \frac{-1}{2} (\vec{x} - \vec{\mu})^{T} \Sigma^{-1} (\vec{x} - \vec{\mu})}p(x)=det(2πΣ)−1/2exp2−1​(x−μ​)TΣ−1(x−μ​)

and

∫p (x) dx=1\int p\,(x)\,dx = 1∫p(x)dx=1

The joint distribution of two random variables XXX and YYY can be described as follows.

p (x,y)=p (X=x and Y=y)p\,(x, y) = p\,(\text{$X = x$ and $Y = y$})p(x,y)=p(X=x and Y=y)

If they are independent, then

p (x,y)=p(x)p(y)p\,(x,y) = p(x)p(y)p(x,y)=p(x)p(y)

If they are conditioned, then

p (x∣y)=p (X=x∣Y=y)p\,(x \mid y) = p\,(X=x \mid Y=y)p(x∣y)=p(X=x∣Y=y)

If p(y)>0p(y) > 0p(y)>0, then

p (x∣y)=p (x,y)p(y)p\,(x \mid y) = \frac{p\,(x, y)}{p(y)}p(x∣y)=p(y)p(x,y)​

Theorem of Total Probability states the following.

p (x)=∑yp (x∣y)p(y)=∫p (x∣y)p(y)dyp\,(x) = \sum_{y} p\,(x \mid y)p(y) = \int p\,(x \mid y)p(y)dyp(x)=y∑​p(x∣y)p(y)=∫p(x∣y)p(y)dy

We can apply Bayes Rule.

p (x∣y)=p (y∣x)p(x)p(y)=p (y∣x)p(x)∑x′p (y∣x′)p(x′)p\,(x \mid y) = \frac{ p\,(y \mid x) p(x) }{ p(y) } = \frac{ p\,(y \mid x) p(x) } { \sum_{x^\prime} p\,(y \mid x^\prime) p(x^\prime)}p(x∣y)=p(y)p(y∣x)p(x)​=∑x′​p(y∣x′)p(x′)p(y∣x)p(x)​

In integral form,

p (y∣x)p(x)∫p (y∣x′)p(x′)dx′\frac{ p\,(y \mid x) p(x) } { \int p\,(y \mid x^\prime) p(x^\prime) dx^\prime}∫p(y∣x′)p(x′)dx′p(y∣x)p(x)​

If xxx is a quantity that we would like to infer from yyy, the probability p(x)p(x)p(x) is referred as prior probability distribution and yyy is called data, e.g. laser measurements. p(x∣y)p(x \mid y)p(x∣y) is called posterior probability distribution over XXX.

In robotics, p(y∣x)p(y \mid x)p(y∣x) is called generative model. Since p(y)p(y)p(y) does not depend on xxx, p(y)−1p(y)^{-1}p(y)−1 is often written as a normalizer in Bayes rule variables.

p (x∣y)=ηp (y∣x)p(x)p\,(x \mid y) = \eta p\,(y \mid x) p(x)p(x∣y)=ηp(y∣x)p(x)

It is perfectly fine to to condition any of the rules on arbitrary random variables, e.g. the location of a robot can inferred from multiple sources of random measurements.

p (x∣y,z)=p (x∣x,z)p (y∣z)p (y∣z)p\,(x \mid y, z) = \frac{ p\,(x \mid x, z) p\,(y \mid z) }{ p\,(y \mid z) }p(x∣y,z)=p(y∣z)p(x∣x,z)p(y∣z)​

for as long as p (y∣z)>0p\,(y \mid z) > 0p(y∣z)>0.

Similarly, we can condition the rule for combining probabilities of independent random variables on other variables.

p (x,y∣z)=p (x∣z)p (y∣z)p\,(x, y \mid z) = p\,(x \mid z)p\,(y \mid z)p(x,y∣z)=p(x∣z)p(y∣z)

However, conditional independence does not imply absolute independence, that is

p (x,y∣z)=p (x∣z)p (y∣z)≠p(x,y)=p(x)p(y)p\,(x, y \mid z) = p\,(x \mid z)p\,(y \mid z) \neq p(x,y) = p(x)p(y)p(x,y∣z)=p(x∣z)p(y∣z)=p(x,y)=p(x)p(y)

The converse is neither true, absolute independence does not imply conditional independence.

The expected value of a random variable is given by

E[X]=∑xxp(x)=∫xp(x)dxE[X] = \sum_{x} x p(x) = \int x p(x) dxE[X]=x∑​xp(x)=∫xp(x)dx

Expectation is a linear function of a random variable, we have the following property.

E[aX+b]=aE[X]+bE[aX + b] = aE[X] + bE[aX+b]=aE[X]+b

Covariance measures the squared expected deviation from the mean. Therefore, square root of covariance is in fact variance, i.e. the expected deviation from the mean.

Cov[X]=E[X−E[X]2=E[X2]−E[X]2Cov[X] = E[X - E[X]^{2} = E[X^{2}] - E[X]^2Cov[X]=E[X−E[X]2=E[X2]−E[X]2

Finally, entropy of a probability distribution is given by the following expression. Entropy is the expected information that the value of xxx carries.

Hp(x)=−∑xp(x)log2p(x)=−∫p(x)log2p(x)dxH_{p}(x) = -\sum_{x} p(x) log_{2}p(x) = -\int p(x) log_{2} p(x) dxHp​(x)=−x∑​p(x)log2​p(x)=−∫p(x)log2​p(x)dx

In the discrete case, the −log2p(x)-log_{2}p(x)−log2​p(x) is the number of bits required to encode x using an optimal encoding, assuming that p(x)p(x)p(x) is the probability of observing xxx .

Theorem of Total Probability states the following.

p(x)=∑yp(x∣y)p(y)=∫p(x∣y)p(y)dyp(x) = \sum_{y} p(x \mid y)p(y) = \int p(x \mid y)p(y)dyp(x)=y∑​p(x∣y)p(y)=∫p(x∣y)p(y)dy

We can apply Bayes Rule.

p(x∣y)=p(y∣x)p(x)p(y)=p(y∣x)p(x)∑x‘p(y∣x‘)p(x‘)p(x \mid y) = \frac{ p(y \mid x) p(x) }{ p(y) } = \frac{ p(y \mid x) p(x) } { \sum_{x`} p(y \mid x`) p(x`)}p(x∣y)=p(y)p(y∣x)p(x)​=∑x‘​p(y∣x‘)p(x‘)p(y∣x)p(x)​

In integral form,

p(y∣x)p(x)∫p(y∣x′)p(x′)dx′\frac{ p(y \mid x) p(x) } { \int p(y \mid x^\prime) p(x^\prime) dx^\prime}∫p(y∣x′)p(x′)dx′p(y∣x)p(x)​

If xxx is a quantity that we would like to infer from yyy, the probability p(x)p(x)p(x) is referred as prior probability distribution and yyy is called data, e.g. laser measurements. p(x∣y)p(x \mid y)p(x∣y) is called posterior probability distribution over XXX.

In robotics, p(y∣x)p(y \mid x)p(y∣x) is called generative model. Since p(y)p(y)p(y) does not depend on xxx, p(y)−1p(y)^{-1}p(y)−1 is often written as a normalizer in Bayes rule variables.

p (x∣y)=ηp(y∣x)p(x)p\,(x \mid y) = \eta p(y \mid x) p(x)p(x∣y)=ηp(y∣x)p(x)

It is perfectly fine to to condition any of the rules on arbitrary random variables, e.g. the location of a robot can inferred from multiple sources of random measurements.

p (x∣y,z)=p(x∣x,z)p(y∣z)p(y∣z)p\,(x \mid y, z) = \frac{ p(x \mid x, z) p(y \mid z) }{ p(y \mid z) }p(x∣y,z)=p(y∣z)p(x∣x,z)p(y∣z)​

for as long as p (y∣z)>0p\,(y \mid z) > 0p(y∣z)>0.

Similarly, we can condition the rule for combining probabilities of independent random variables on other variables.

p (x,y∣z)=p(x∣z)p(y∣z)p\,(x, y \mid z) = p(x \mid z)p(y \mid z)p(x,y∣z)=p(x∣z)p(y∣z)

However, conditional independence does not imply absolute independence, that is

p (x,y∣z)=p(x∣z)p(y∣z)≠p(x,y)=p(x)p(y)p\,(x, y \mid z) = p(x \mid z)p(y \mid z) \neq p(x,y) = p(x)p(y)p(x,y∣z)=p(x∣z)p(y∣z)=p(x,y)=p(x)p(y)

The converse is neither true, absolute independence does not imply conditional independence.

The expected value of a random variable is given by

E[X]=∑xxp(x)=∫xp(x)dxE[X] = \sum_{x} x p(x) = \int x p(x) dxE[X]=x∑​xp(x)=∫xp(x)dx

Expectation is a linear function of a random variable, we have the following property.

E[aX+b]=aE[X]+bE[aX + b] = aE[X] + bE[aX+b]=aE[X]+b

Covariance measures the squared expected deviation from the mean. Therefore, square root of covariance is in fact variance, i.e. the expected deviation from the mean.

Cov[X]=E[X−E[X]2=E[X2]−E[X]2Cov[X] = E[X - E[X]^{2} = E[X^{2}] - E[X]^2Cov[X]=E[X−E[X]2=E[X2]−E[X]2

Finally, entropy of a probability distribution is given by the following expression. Entropy is the expected information that the value of xxx carries.

Hp(x)=−∑xp(x)log2p(x)=−∫p(x)log2p(x)dxH_{p}(x) = -\sum_{x} p(x) log_{2}p(x) = -\int p(x) log_{2} p(x) dxHp​(x)=−x∑​p(x)log2​p(x)=−∫p(x)log2​p(x)dx

In the discrete case, the −log2p(x)-log_{2}p(x)−log2​p(x) is the number of bits required to encode x using an optimal encoding, assuming that p(x)p(x)p(x) is the probability of observing xxx.

PreviousRecursive State EstimationNextRobot Environment Interaction

Last updated 5 years ago

Was this helpful?