 ## Empirical distribution

Empirical distribution relates each value in a sample with some probability of non-exceedance. The purpose of defining the empirical distribution is to compare it with selected theoretical distributions in order to verify whether they fit sample data.

In order to define empirical distribution, data are ranked in ascending order. The i-th value in the ordered sample is denoted x(i), the smallest value x(1), and the largest x(n). For each value, a sample estimate of the probability of non-exceedance pi is computed, representing the cumulative empirical distribution function: pi = Fe[x(i)].

The simplest estimate of probability pi is the cumulative relative frequency of sample data of the i-th value in the sample. The cumulative relative frequency is a step function; it jumps at each value from the sample from (i  1)/n to i/n Cumulative relative frequencies for a sample with size n = 10

If the upper value is used to estimate pi, then we have: In this way, however, the probability of non-exceedance of x(n) is equal to 1, meaning that it is certain that random variable X is smaller than the largest observed value x(n); such a statement is not realistic. On the other hand, using the lower value of the cumulative relative frequency for estimating pi, we get: With such an estimate, the probability of non-exceedance of x(1) is equal to 0, i.e. it is impossible for random variable X to be smaller than the smallest observed value x(1), which is again unrealistic.

The above two formulas for pi are actually the limits within which the probabilities of sample values should lie. The probability pi is known as the plotting position, simply because it defines a position for a sample value on the probability-magnitude graph.

Many different formulae for pi have been suggested in literature. The first one was proposed by Hazen and now is known as the Hazen's plotting position. It represents the midpoint of the step in the cumulative relative frequency: Another famous formula is Weibull's plotting position: The reasoning behind this formula is the following: if n values should be uniformly spread over the probability range between 0 and 1, then there should be n  1 intervals between the values and two intervals at ends, making total of n + 1 interval.

Many other plotting positions are proposed, most of which can be expressed in general form: where a is a constant having values from 0 to 0.5 in different formulae. There is no universal plotting position. Different plotting positions are shown in the table and the scope of their application is noted.

Plotting position formulae (after Cunnane, 1978)

 Name Formula a Note Hazen (i  0.5) / n 0.5 performs well for normal and Gumbel distributions Weibull i / (n + 1) 0 biased at upper end of positively skewed distributions Blom (i  3/8) / (n + 1/4) 3/8 approximate formula for unbiased p.p. in normal distribution Gringorten (i  0.44) / (n + 0.12) 0.44 approximate formula for unbiased p.p. in Gumbel distribution Median (Beard) (i  0.31) / (n + 0.38) 0.31 median p.p. for quantiles and probabilities in all distributions Cunnane (i  2/5) / (n + 1/5) 2/5 compromise formula for all distributions  