The problem to obtain random samples from a normal distribution (Gaussian distribution) is solved using Kettle in the way below.
The starting point is the use of Math.random(), this JavaScript function enable us to generate numbers from a uniform distribution (or rectangular), the probability density function of such distribution assign to every possible values in a closed interval, tipically [0, 1], the same probability, so when we consider a sample from a uniform distribution we are talking about random numbers.
Then we will implement a Box-Muller transformation. This represents a method to generate couples of independent random numbers having a standard normal distribution. The transformation is usually expressed in two ways, we present the original version (G.E.P. Box, M.E. Muller, 1958).
Let U1 and U2 be two random variables, independent and uniformly distributed in the interval (0,1]. Let define
and
then Z0 and Z1 are independent random variables having a normal distribution with 0 average and 1 standard deviation.
In many situations to have random normal distributed values available is very important, the supremacy of the normal distribution derived from three main reasons:
- several phenomena are to be considered normally distributed: it happens when a variable have values mostly concentrated close to an average value, values far from this value are ever more unlikely, positive and negative differences from the average value have the same probabilities associated (so we have a simmetric distribution). This behaviour is quite common in nature. This intuitive conclusion could be verify sampling from the population and then performing a goodness of fit test.
- The normal distribution could be seen as a good asinptotical approximation for some discrete and continuous distributions like the binomial (Moivre-Laplace,…), the t distribution (Cramér, 1946), the Gamma distribution (Ricci, 1975) and the Chi-square distribution (Birnbaum, 1964). So we are able to use the normal distribution also when the population is not normally distributed.
- Due to the central limit theorems (Lindeberg-Lèvy,…), sampling from an unknown distribution (with only mean and variance finite defined) we obtain normal distributed variables when the sample dimension n is large enough.
From the above considerations we can conclude that the need to have normally distributed values is crucial in many statistical problems.
To create the Random Number generation with Kettle we can use a Javascript step as outlined in the transformation below:
You can download the transformation at Studio Synthesis site: Random Normal Number Generator



