Mixture of normal distributions

General. When studying data from a practical situation it is easy to suspect that the data might consist of two or more normal distributions (%mix*). Especially if a normal distribution is expected.
Perhaps the data consist of produced items from a number of sources, machines, spindels, etc. Even if the items are produced from the same drawing there is variation. The process might also be 'drifting' and thus produces data with constantly changing mean or variation.
There are a number of practical statistical tools such as histogram, probability plot (%Hist*), cusumplot (%Diagn*, %SimDiagn*), etc, to graphically see if the data is non-normal (non-symmetrical). NB that it can be that the data comes from an non-symmetrical distribution (typically time measurements) and in such case the idea of normal distributions is wrong.

Formal analysis. A formal analysis demands that there are extra data columns showing what machine, spindle, etc that was used. A common way is the to perform a so-called t-test (%t-test*).
(It is actually possible, at least in theory, to use the measurements only. By a complicated mathematical method the five parameters, equivalent to the five sliders to the left ([Change parameters]), can be estimated. However, this demands a large number of data values and will most likely produce results with a large uncertainty.)

(*A number macros for Minitab can be obtained by request via www.ing-stat.se)

••••

μ distr 1

μ distr 2

σ distr 1

σ distr 2

Proportion distr 1

Exercise 1 – change the parameters
Change the parameters via the slides and note that the distribution changes accordingly. If necessary, change the min or max values of the X-axis.
Make sure that the two μ-parameters have the same values and set also the two σ-parameters to equal values. Change the proportion slide and notice that the two total parameters do not change.

Exercise 2 – one sigma difference
Change the two μ-parameters to 40 and 45, respectively. Change the two σ-parameters to 5. Set the proportion to 0.5. Thus the difference in mean is 45-40=5 i.e. one standard deviation. Notice that the resulting distribution does not visually reveal this rather large difference. (To find this difference other data and methods are needed.) Decrease the first parameter and note that it needs nearly two sigma difference before the difference becomes visible.

Exercise 3 – plus/minus three sigma
Change the two μ-parameters to 20 and 40, respectively. Change the two σ-parameters to 4. Set the proportion to 0.5. This produces a distribution with two distinct peaks where the mean is 30.00 and sigma is 10.77. Notice that the rule of thumb of plus/minus three sigmas embraces practically all the distribution.

••••

After reading the 'info'-fields and performing the exercises, it is obvious that a mixture of distributions can be difficult to find.
Usually there is a need for other variables that indicate e.g. machine or similar.
If the data consists of a sudden change in mean, this can sometimes be found by e.g. SQC-metods or other types of time series analysis.

••••

The expected value where p is the proportion of the first normal distribution (0 < p < 1):
$μ_{tot} = p \cdot μ_{1} + (1 - p) \cdot μ_{2}$
The standard deviation:
$σ_{tot} = \sqrt{p \cdot [σ_{1}^{2} + (μ_{tot} - μ_{1})^{2}] + (1 - p) \cdot [σ_{2}^{2} + (μ_{tot} - μ_{2})^{2}]}$
The pdf_tot is the 'height' of the mixed distribution at every X-value:
${pdf}_{tot} = p \cdot {pdf}_{1} + (1 - p) \cdot {pdf}_{2}$

••••

The blue line is the resulting mixed distribution and the area under the curve is the probability. The total area is 1.

The expected value is indicated on the X-axis as one red vertical bar with the value attached to it. The small red lines indicate 1, 2, and 3 sigma from the expected value.

The X-axis can be changed by clicking and changing the min or max values for a better fit.

Use the button [Ordinary normal] to learn more about the normal distribution.

••••

Exercises.  A number of exercises to further illuminate certain features of mixture of variables.

Some conclusions.  A summary of the main ideas and problems with mixture of variables.

Formulas.  There are three main formulas that are used for the mixed result: the expected value, the standard deviation and the probability distribution. These formulas are valid for all distributions.

Change parameters.  It is possible to change the parameters for the mixed distribution. This is done using five sliders.

Mixed Poisson.  The button leads to a page showing a mixture of Poisson distributions.

Mixed normal.  The button leads to a page showing a mixture of normal distributions.

Ordinary Poisson.  The button leads to a page showing all basic features of a Poisson distribution.

Ordinary normal.  The button leads to a page showing all basic features of a normal distribution.

μ.  The theoretical mean of the mixture of distributions.

σ.  The theoretical standard deviation of the mixture of distributions.

••••

The range of the slides can not be changed. The four top slides move 0.1 every time a right or left arrow is pressed. The bottom slide moves 0.01 every time a right or left arrow is pressed.

μ distr 1:  The theoretical mean of the first normal distribution.

σ distr 1:  The theoretical standard deviation of the first normal distribution.

μ distr 2:  The theoretical mean of the second normal distribution.

σ distr 2:  The theoretical standard deviation of the second normal distribution.

Proportion (p):  The proportion of the first distribution and thus (1-p) is the proportion of the second distribution.

••••