Ing-Stat

- statistics for the industry

Ph: +46 70 593 7505

Back

A number of articles about various statistical subjects. Most of them contains codes for simulation.

A short story about the special case when you check 'n' items, humans, etc and you find 0 faulty items or 0 humans with a certain disease, attribute, etc. In most discussions people will react with a short 'good!' but what inference about the true fault rate can be done? An amazingly easy and powerful approximation can be used. The document also contains a simulation and the theoretical derivation of the approximation. ("A fairytale.doc", 3 pages).

A good example of linear combinations

This paper discusses a good example using so-called linear combinations of variables. Such combinations are present in all engineering using components that are joined or combined, not only physically but also e.g. time components. ("A good example of linear combinations.doc", 5 pages).

Shows some ideas around the concept of a mixture of random variables. The document shows a number of simulations and macros in order to support the discussion. See also the macro %Mix, for a thorough discussion of mixture of variables. A mixture must not be confused with so-called combinations of variables (see special document below). ("A mixture of variables.doc", 5 pages).

Inspection of items is often carried out as some kind of sampling procedure. The idea with sampling is of course based on the idea of reducing costs. However, there is a certain risk that incorrect items reach the customer and thus creating costs. This document discusses the balance of costs, fraction of inspected items, etc. See also the Ing-Stat-macro %CostSim. ("CostSim.doc", 4 pages).

In basic training in statistical methods the most common value for variation is the standard deviation and thus the variance. In many cases this is enough and what is needed. However, in many situations there is a need to discuss the notion of so-called co-variation defined as covariance. This is a measure of variance 'between' two variables. The covariance is a part of most analysis, perhaps invisibly taken care of by the software. All concepts are supported by simulation for better understanding. ("Covariance.doc", 4 pages).

Basic about 'order statistics'

Anyone who studies the theory of statistics is familiar with the idea of average. Also, many printouts shows the median and perhaps
the first and third quartile (the median is the second quartile) and often the smallest and the largest values. While the average
is a linear combination of the data (we add all the values and divide by ’n’) the median, the minimum, the maximum etc are
called *order statistics*. This means that the dataset is ordered and thus there will be e.g. a smallest and a largest
value. The macro %MinMax shows some ideas and the *Fastest scorer.doc* treats also these features. In this document the
two expressions *F*(*x*) and *f*(*x*) are used. All concepts are well supported by simulation. ("Order statistics.doc", 7 pages).

Basic about 'sum of binomial distributions'

Suppose that we have e.g. an electronic circuit with a number of different components with different fault rates. If we now count the number of incorrect components (minimum 0, maximum 'n' (the number of components)) we do not get observations from a binomial distribution. Instead we get a sum of different binomial distributions. The document shows the derivation of the mean and standard deviation. The derivation of the probability distribution of this sum is mathematically difficult and is shown briefly in the appendix. All concepts are illustrated by simulations. ("Sum of Bin.doc", 7 pages).

Buffon's needle — estimating a famous constant

Buffon's needle" is a classical problem in statistics. It shows a way to estimate the famous number
*π* (approx 3.14). This number is of course available with any wanted number of
decimals, but the document contains a good, general, statistical discussion. It also contains a number of simulations
in order to clarify important issues. In the literature there is a further discussion how to sharpen the procedure and
calculation giving a drastically reduced uncertainty (See also the macro %Pi). ("Buffon's needle, part I.doc", 8 pages).

The document discusses, by using a number of examples, the important idea of combinations of variables. The neccesary formulas for calculating expected values and standard deviation are included. The concepts are also well simulated for better understanding. Also, the so-called Gauss' approximation formulas are presented and explained, including their derivations. (See also the macro %LinC) ("Combinations of variables.doc", 12 pages).

In many standard books in statistics there are methods to compare e.g. two different means, two different variances, two different proportions, but seldom two different outcomes from Poisson distributions. This document shows how such an analysis can be transfered to a simpler situation. The theory is also supported by simulation and the use of a macro. ("Trick Po.doc", 4 pages).

Estimating sigma in a normal distribution

This document looks at three different ways of estimating sigma in a normal distribution. One of them was suggested by Gauss and is based on using the median of the absolute errors (i.e. the deviations from the mean). The document shows by simulations that the estimator proposed by Gauss has a larger variance. Also, the common sigma estimator with 'n-1' and 'n', respectively, are investigated, again by simulation. The term 'bias' is also discussed. ("Estimating sigma, Normal.doc", 4 pages).

A so-called Pareto-analysis or a Pareto-diagram is a very common way of graphically show certain types of statistical data. This document discusses the analysis of data that is created using the macro %CrePareto. ("Ex Pareto.doc", 4 pages).

Quite often goals are stated with no or very little thought of how the analysis of the result is to be performed. offer a number of articles in various fields of statistics and the articles can be a result from frequently asked questions or an ambition to explain some common misunderstanding about the theory of statistics. We have tried to be more careful about relying entirely on the use of mathematics. Where suitable we have added commands for calculation or simulation of the subjects discussed. ("Goals and suborders.doc", 6 pages).

Measurements — some ideas and experiences

This document contains a collection of experiences related to goals, measurements, data and statistical analysis. There is no single way to create a working system of related activities aiming to support a process with information, insight, understanding, etc towards improvements. Ambitions, resources, methods, etc should always be related to the need of the wanted position. ("Measurements.doc", 4 pages)

Many processes can be regarded as a simple linear combination of variables. However, sometimes two or several process steps are run in parallel and the process can not continue until the last one of the tasks reaches a certain point. In this case we thus look at the maximum (i.e. an order statistic). The document discusses and simulates some situations. ("Parallel operations.doc", 3 pages).

This document discusses a single numerical value stated in a radio program and how the so-called Rayleigh distribution can be used for some inference. ("Rayleigh on the radio.doc", 1 page).

This document discusses some difficulties about the interpretation of goals. What is exactly meant by the described goal? It also stresses a need for an operational definition i.e. more or less an 'instruction book' that clearly describes what shall be measured, how it shall be measured, how we infer from the data whether the goal was met or not. Such a description must take into consideration uncertainties and the variation at hand; i.e. issues that usually are handled by a statistical approach. ("Ideas about goals.doc", 2 pages).

The correlation coefficient is a common and popular entity in statistics. The document discusses the use of the coefficient, calculation, etc, which also is supported by simulation. (See also the macro %CorrTest) ("Correlation coefficient.doc", 5 pages).

The taxi problem is a classical statistical problem. It involves observations of the integers 1, 2, 3, ...*N*
where *N* is the largest value. The idea is how to estimate *N* from the other observations.
Two different estimators are shown with remarkably different features. The difference is shown via simulation. ("Taxi problem.doc", 3 pages).

The fastest scorer — an order statistic

The concept of order statistics is of outmost importance and very common. The time to the first score in a match is an example of an order statistic (others are e.g. the quartiles, the median, the maximum, etc.). The idea with the order statistics is derived and illuminated using simulation. Also, the recorded time to first score since 1959 in the Swedish league is studied. ("Fastest scorer.doc", 4 pages).

The inflexion points of a normal curve

When showing the standard deviation in a normal distribution many diagrams has a straight line out to the curve at is 'point of inflexion'. The meaning of this entity is discussed in the document. There is a doubt whether this is a good way of showing sigma (the standard deviation). It actually only strengthens the incorrect opinion that sigma is something that is closely and solely connected to the normal distribution. This is wrong: practically all distributions have a sigma (a well-known exemption is the so-called Cauchy distribution). ("Inflexion point.doc", 3 pages).

This document discusses a newspaper article about the safety of so-called PIN-codes, a four digit code used in many areas. The problem is also simulated ("The PIN code.doc", 2 pages).

The document discusses and simulates the use of the Poisson distribution in connection to designing the number of spare parts needed in order to maintain a certain, prespecified probability of shortage, i.e. the probability of an order for a spare part when there is no part available. ("Prob of shortage.doc", 2 pages).

The product of integer random variables

The document discusses the features of a product of two random, integer variables, *X* and *Y*. For illustration purposes
both variables have a small range and all possible outcomes of *XY* are therefore listed. In addition, the expected value and the
standard deviation of *XY* are also derived. All results are supported by simulation of the situation. ("Product.doc", 5 pages).

The sum of a random number of random variables

The only difference between 'random sums' and 'the sum of a random number of variables' is usually that the former consists of a fixed number of terms while the latter consists of a random number of terms. The load of an elevator is a good example: an elevator transports 0, 1, 2, 3... people simultanously. The load is the sum of a random number of people, each one with a random weight. The concept is illustrated by using a die to determine the number of terms and then throw a die that number of times and adding the result. The idea is illustrated by simulation. The completely general case is also derived. ("Random sums.doc", 11 pages).

This document illuminates perhaps a small but not unimportant problem in statistics namely that we have two different estimators of a parameter. Typical questions are: “In what way are the estimators different? Do they have different precision, measured in some way? Are there any (mathematical) difficulties in the calculations? Etc, etc. The document discusses two different estimators of the variance of a special variable and illustrates by simulation. ("Two estimators.doc", 3 pages).

Anyone who studies the theory of statistics will sooner or later meet the expression *sigma*. However, this is not seldom presented in a
confusing or incomplete way. This paper presents the most important features of sigma even if it is presented elsewhere in several other documents.
The main idea are supported by simulation. ("What is sigma.doc", 5 pages)

When estimating the variance of a random variable the most common formula contains a 'n–1' in the denominator. There are of course
no difficulties in performing the calculations, even by hand, and in most cases there is no great difference, at least when 'n' is fairly large,
whether we use 'n–1' or 'n' in the denominator. This document does the necessary mathematical derivation but also performs some simulations. ("Why n-1.doc", 6 pages).

••••