Let X denotes a random variable, then x is a specific value that X might assume.
p(X=x)denotes the probability of X has the value of x
All continuous random variables possess probability density function, PDF.
We can abbreviate the equation as follows, because it is a normal distribution.
However, in general, x is not a scalar value, it is generally a vector. Let Σ be a positive semi-definite and symmetric matrix, which is a covariance matrix.
The joint distribution of two random variables X and Y can be described as follows.
p(x,y)=p(X=x and Y=y)
If they are independent, then
If they are conditioned, then
If p(y)>0, then
Theorem of Total Probability states the following.
If x is a quantity that we would like to infer from y, the probability p(x) is referred as prior probability distribution and y is called data, e.g. laser measurements. p(x∣y) is called posterior probability distribution over X.
In robotics, p(y∣x) is called generative model. Since p(y) does not depend on x, p(y)−1 is often written as a normalizer in Bayes rule variables.
It is perfectly fine to to condition any of the rules on arbitrary random variables, e.g. the location of a robot can inferred from multiple sources of random measurements.
for as long as p(y∣z)>0.
Similarly, we can condition the rule for combining probabilities of independent random variables on other variables.
However, conditional independence does not imply absolute independence, that is
The converse is neither true, absolute independence does not imply conditional independence.
The expected value of a random variable is given by
Expectation is a linear function of a random variable, we have the following property.
Covariance measures the squared expected deviation from the mean. Therefore, square root of covariance is in fact variance, i.e. the expected deviation from the mean.
Finally, entropy of a probability distribution is given by the following expression. Entropy is the expected information that the value of x carries.
In the discrete case, the −log2p(x) is the number of bits required to encode x using an optimal encoding, assuming that p(x) is the probability of observing x .
Theorem of Total Probability states the following.
If x is a quantity that we would like to infer from y, the probability p(x) is referred as prior probability distribution and y is called data, e.g. laser measurements. p(x∣y) is called posterior probability distribution over X.
In robotics, p(y∣x) is called generative model. Since p(y) does not depend on x, p(y)−1 is often written as a normalizer in Bayes rule variables.
It is perfectly fine to to condition any of the rules on arbitrary random variables, e.g. the location of a robot can inferred from multiple sources of random measurements.
for as long as p(y∣z)>0.
Similarly, we can condition the rule for combining probabilities of independent random variables on other variables.
However, conditional independence does not imply absolute independence, that is
The converse is neither true, absolute independence does not imply conditional independence.
The expected value of a random variable is given by
Expectation is a linear function of a random variable, we have the following property.
Covariance measures the squared expected deviation from the mean. Therefore, square root of covariance is in fact variance, i.e. the expected deviation from the mean.
Finally, entropy of a probability distribution is given by the following expression. Entropy is the expected information that the value of x carries.
In the discrete case, the −log2p(x) is the number of bits required to encode x using an optimal encoding, assuming that p(x) is the probability of observing x.