Page 1 of 1

Generate random numbers with gaussian distribution

Posted: Sun Jul 04, 2004 6:15 pm
by tinman
It's probably not great, but it works. There is also an example attached to the bottom of the procedure.

Code: Select all

#GRANDOM_TEST = 1   ; Set this to zero to disable the test code, anything else to enable it

#M_PI = 3.14159;265358979323846
#GRAND_MAX = 32767  ; Used when generating random numbers

; Name:         GRandom
; Synopsis:     number = GRandom(mean, std)
; Parameters:   mean.f  - Center value around which the random numbers will be generated
;               std.f   - Standard deviation, which controls the range from which the majority of the numbers will be generated
; Returns:      number.f - The generated random number
; Globals:      None
; Description:  Generates a random number using a Gaussian (normal) distribution
;               centred around the specified mean point with the given standard deviation
Procedure.f GRandom(mean.f, std.f)
    DefType.f   u1
    DefType.f   u2
    DefType.w   side    ; Select with side of the mean we take value from - simply alternates
    DefType.f   temp
    DefType.f   number

    side = Random(#GRAND_MAX) & 1                   ; randomly choose side of curve
    temp = Random(#GRAND_MAX) + 1
    temp = temp / #GRAND_MAX                        ; generate random number between 0 and 1 (not including 0!!)
    u1 = std * Sqr(Log(temp) * -2)                  ;
    temp = Random(#GRAND_MAX)
    temp = temp / #GRAND_MAX                        ; generate random number between 0 and 1
    u2 = temp * 2 * #M_PI                           ; generate random number between 0 and 2pi

    If side
        number = u1 * Cos(u2) + mean
    Else
        number = u1 * Sin(u2) + mean
    EndIf

    ProcedureReturn number
EndProcedure


; Example starts here
CompilerIf #GRANDOM_TEST<>0

#NUMBER_BINS = 320

Dim bins.w(#NUMBER_BINS)
For number_numbers=0 To 10000
    use_bin.f = GRandom(#NUMBER_BINS / 2, 20)
    ;use_bin.f = Random(#NUMBER_BINS)           ; Uncomment this line to see the distribution of the built in random number generator
    If use_bin>=0 And use_bin<#NUMBER_BINS
        bins(Int(use_bin)) + 1
    EndIf
Next

If CreateImage(1, #NUMBER_BINS, 256)=0 : End : EndIf
StartDrawing(ImageOutput())
Box(0, 0, #NUMBER_BINS, 256, RGB(0,0,0))
FrontColor(255, 255, 255)
For current_bin=0 To #NUMBER_BINS-1
    ;Debug "bin "+Str(current_bin)+": "+Str(bins(current_bin))
    Line(current_bin, 0, 0, bins(current_bin))
Next
StopDrawing()

If OpenWindow(0, #CW_USEDEFAULT, #CW_USEDEFAULT, 400, 300, #PB_Window_SystemMenu|#PB_Window_MinimizeGadget|#PB_Window_MaximizeGadget|#PB_Window_SizeGadget|#PB_Window_TitleBar, "Test")

    If CreateGadgetList(WindowID()) 
        ImageGadget(1, 10, 10, 320, 256, ImageID())
    EndIf

    Repeat
        ev.l = WaitWindowEvent()
        While ev
            Select ev
                Case #PB_Event_CloseWindow
                    quit = 1
            EndSelect
            
            ev = WindowEvent()
        Wend
    Until quit=1
    
    CloseWindow(0)
EndIf
End
CompilerEndIf

Posted: Thu Jun 26, 2008 8:20 am
by Little John
Works also with PB 5.20

Here are two simpler procedures for generating "normally distributed" random numbers. The second one is similar to the procedure posted by tinman.
(after two German textbooks:
Hartung, J.; Elpelt, B.; Klösener, K.H.: Statistik.
14th ed. 2005, München: Oldenbourg.
p. 147.
Sachs, L.: Angewandte Statistik.
8th ed. 1997, Berlin/Heidelberg: Springer.
p. 111)

When using the default parameters, random numbers according to the standard normal distribution are generated.
(code tested on PB 4.20)

Regards, Little John

Code: Select all

Procedure.f RandomNormal1 (mean.f=0, stdDev.f=1)
   ; This procedure is based on the central limit theorem, which says that the sums of many
   ; independent random numbers are almost normally distributed.
   ; n = 12 is usually sufficient, but n can be bigger, of course. This procedure implements
   ; the general principle. Someone who always wants to use n = 12 can simplify the procedure,
   ; of course.
   ;
   ; Using the default parameters, the procedure returns a real number from the open interval ]-6, +6[.
   ; This is not perfect, since theoretically it should be a real number between -oo and +oo.
   ; However, the theoretical probability of a value that is _not_ in the interval ]-6, +6[ is
   ; very small. It's so small that it is not even noted in the tables of my statistical textbooks. :-)
   ; "Almost all" returned values will be in the interval [-3, +3]. The probability for returning a
   ; value which is outside of this interval is only about 0.0027.
   Protected r.f, i.l
   Protected n.l=12

   r = 0
   For i = 1 To n
      r + (Random(999998)+1) / 1000000   ; random number from the open interval ]0, 1[
   Next
   r = (r - n/2) / Sqr(n/12)

   ProcedureReturn r*stdDev + mean
EndProcedure


Procedure.f RandomNormal2 (mean.f=0, stdDev.f=1)
   ; The procedure can return any real number (explanation see in a post below). "Almost all"
   ; returned values will be in the interval [mean-3*stdDev, mean+3*stdDev]. The probability
   ; for returning a value which is outside of this interval is only about 0.0027.
   Protected x1.f, x2.f, r.f

   ; random numbers from the open interval ]0, 1[
   x1 = (Random(999998)+1) / 1000000        ; must be > 0 because of Log(x1)
   x2 = (Random(999998)+1) / 1000000

   r = Sqr(-2*Log(x1)) * Cos(2*#PI*x2)
   ProcedureReturn r*stdDev + mean
EndProcedure


Debug RandomNormal1()
Debug RandomNormal2()
//edit
In the code above it reads:
However, the theoretical probability of a value that is _not_ in the interval ]-6, +6[ is very small.
This is now demonstrated by this code.

Posted: Thu Jun 26, 2008 11:36 am
by Kaeru Gaman
Quite a while ago I needed a gauss distributed random number,
I ended up in simulating a center concentrated curve by using a sinus...

Code: Select all

Procedure.l GauRand(Range.l)
    Range2 = Range / 2
    PiHalf.f = #PI / 2
    Prec = 500000
    Z = Range2 + Range2 * (ASin( (Random(Prec*2) - Prec ) / Prec ) / PiHalf)
    ProcedureReturn Z
EndProcedure
I also wrote some little demo to it:

demonstrates linear distribution of standard-random

Code: Select all

InitSprite()
InitKeyboard()
OpenScreen( 1024, 768, 32, "RandomDemo" )
Dim Field( 1024 )
Repeat
    ExamineKeyboard()
    For n=0 To 50
      Z = Random(1023)
      Field(Z)+1
    Next
    StartDrawing(ScreenOutput())
        FrontColor($FF8000)
        For n=0 To 1023
            Line(n, 767, 0, -Field(n) )
        Next
    StopDrawing()
    Delay(0)
    FlipBuffers()
Until KeyboardPushed(#PB_Key_Escape)
demonstrates simulated gauss distribution

Code: Select all

InitSprite()
InitKeyboard()
OpenScreen( 1024, 768, 32, "GaussDemo1" )
Dim Field( 1024 )
Prec = 500000
Range = 1024
Range2 = Range / 2
PiHalf.f = #PI / 2
Repeat
    ExamineKeyboard()
    For n=0 To 50
      Z = Range2 + Range2 * (ASin( (Random(Prec*2) - Prec ) / Prec ) / PiHalf)
      Field(Z)+1
    Next
    StartDrawing(ScreenOutput())
        FrontColor($FF8000)
        For n=0 To 1023
            Line(n, 767, 0, -Field(n) )
        Next
    StopDrawing()
    Delay(0)
    FlipBuffers()
Until KeyboardPushed(#PB_Key_Escape)
[edith] noone tested this, eh? FrontColor was in V3.x notation...

a bit more optical effect:
hits spread over a 2D area, occurrence count is shown by color.

Code: Select all

InitSprite()
InitKeyboard()
OpenScreen( 1024, 768, 32, "GaussDemo2" )
Dim Field( 768,768 )
Dim Col(128)
For n=0 To 31 : t = n*8
    Col(n   ) = RGB(     0,     0,     t)
    Col(n+32) = RGB(     t,     0, 255-t)
    Col(n+64) = RGB(   255,     t,     0)
    Col(n+96) = RGB(   255,   255,     t)
Next
Prec = 500000
Range = 768
Range2 = Range / 2
PiHalf.f = #PI / 2
Repeat
    ExamineKeyboard()
    For n=0 To 25000
        X = Range2 + Range2 * (ASin( (Random(Prec*2) - Prec ) / Prec ) / PiHalf)
        Y = Range2 + Range2 * (ASin( (Random(Prec*2) - Prec ) / Prec ) / PiHalf)
        If Field(X,Y)<127 : Field(X,Y)+1 : EndIf
    Next
    StartDrawing(ScreenOutput())
        For t=0 To 767
            For n=0 To 767
                Plot( 128+n, t, Col(Field(n,t)))
            Next
        Next
    B = 1 - B : If B : Line(0,0,32,0,$FF0000) : Else : Line(0,0,32,0,$0000FF) : EndIf
    StopDrawing()
    Delay(0)
    FlipBuffers()
Until KeyboardPushed(#PB_Key_Escape)

Posted: Sun Jun 29, 2008 9:20 am
by dell_jockey
Kaeru,

even if using half a sine wave is not really a Gaussian distribution (the sine approach spreads the probability distribution a bit too far to the outside) I like your examples a lot. Thanks for posting!

Posted: Sun Jun 29, 2008 11:13 am
by Kaeru Gaman
I know, I spoke of "simulating".
I simply did not think of a group of calls.
on the other hand, it may be faster to have 1random+1arcussinus,
instead of 2random+1Srq+1Log+1Cos

anyways,
every "curve" distribution is better for e.g. random hitpoint values in fights than the linear distribution.
normally in Pen&Paper you use two dices iirc...

@Little John
what is the interval of your results? -pi/+pi?

[edith]
I put LJ's procedure into a graphic display.
this truely is a gauss distribution.
I had to add a check, because some values are out of range.
amount can be seen from the leftmost bar.
seems to be caused by the float incertaincy.

Code: Select all

Procedure.f RandomNormal2 (mean.f=0, stdDev.f=1)
   Protected x1.f, x2.f, r.f

   ; random numbers from the open interval
   x1 = (Random(999998)+1) / 1000000        ; must be > 0 because of Log(x1)
   x2 = (Random(999998)+1) / 1000000

   r = Sqr(-2*Log(x1)) * Cos(2*#PI*x2)
   ProcedureReturn r*stdDev + mean
EndProcedure 

InitSprite()
InitKeyboard()
OpenScreen( 1024, 768, 32, "RandomNormal2-Demo" )
Dim Field( 1024 )
Prec = 500000
Range = 1024
Range2 = Range / 2
PiHalf.f = #PI / 2
Repeat
    ExamineKeyboard()
    For n=0 To 50
      Z = Range2 + Range2 * RandomNormal2() / #PI
      If Z < 0 Or Z > 1024
        Z = 0
      EndIf
      Field(Z)+1
    Next
    StartDrawing(ScreenOutput())
        FrontColor($FF8000)
        For n=0 To 1023
            Line(n, 767, 0, -Field(n) )
        Next
    StopDrawing()
    Delay(0)
    FlipBuffers()
Until KeyboardPushed(#PB_Key_Escape)
[julia]
well, does not seem to be just float incertaincy, but my lack of knowledge about the range of the returned values...

this example schows that it stays mostly in range 0-1024 when I use a mean of 512 and a stdDev of 128
thus, the factor is not pi as I assumed but 4?

...a bit more documentation to your example could have been helpful, LJ... ;)

Code: Select all

Procedure.f RandomNormal2 (mean.f=0, stdDev.f=1)
   Protected x1.f, x2.f, r.f

   ; random numbers from the open interval
   x1 = (Random(999998)+1) / 1000000        ; must be > 0 because of Log(x1)
   x2 = (Random(999998)+1) / 1000000

   r = Sqr(-2*Log(x1)) * Cos(2*#PI*x2)
   ProcedureReturn r*stdDev + mean
EndProcedure 

InitSprite()
InitKeyboard()
OpenScreen( 1024, 768, 32, "RandomNormal2-Demo" )
Dim Field( 1024 )
Prec = 500000
Range = 1024
Range2 = Range / 2
PiHalf.f = #PI / 2
Repeat
    ExamineKeyboard()
    For n=0 To 50
      Z = RandomNormal2(512,128)
      If Z < 0 Or Z > 1024
        Z = 0
      EndIf
      Field(Z)+1
    Next
    StartDrawing(ScreenOutput())
        FrontColor($FF8000)
        For n=0 To 1023
            Line(n, 767, 0, -Field(n) )
        Next
    StopDrawing()
    Delay(0)
    FlipBuffers()
Until KeyboardPushed(#PB_Key_Escape)
;---------------------------------------------------------------------------------

@tinman
sorry, I did not mess with your code so far... ;)

Posted: Sun Jun 29, 2008 12:05 pm
by Kaeru Gaman
k, sorry, now I doublepost because the other post gets too long...

made a performance test...
strange: LJ's log-method is slightly faster than my simulation... :?

Code: Select all

EnableExplicit

Define timer1, timer2, n, Z

Define mean = 512
Define stdDev = 128

Define Range2 = 512
Define PiHalf.f = #PI / 2
Define Prec = 500000

Define x1.f, x2.f, r.f

timer1 = ElapsedMilliseconds()
  For n=0 To 99999999
     x1 = (Random(999998)+1) / 1000000        ; must be > 0 because of Log(x1)
     x2 = (Random(999998)+1) / 1000000
     r = Sqr(-2*Log(x1)) * Cos(2*#PI*x2)
     Z = r*stdDev + mean
  Next
timer1 = ElapsedMilliseconds() -timer1

timer2 = ElapsedMilliseconds()
  For n=0 To 99999999
    Z = Range2 + Range2 * (ASin( (Random(Prec*2) - Prec ) / Prec ) / PiHalf) 
  Next
timer2 = ElapsedMilliseconds() -timer2

MessageRequester("SpeedTest", "Log-Method : " +Str(timer1)+ #CRLF$ +"Simulation : " +Str(timer2) )

Posted: Thu Jul 03, 2008 10:30 am
by Little John
Hi Kaeru,

unfortunately, I missed the progress of this discussion. :oops:

> what is the interval of your results? -pi/+pi?
...
> ...a bit more documentation to your example could have been helpful, LJ...

You're right, sorry. Now I've added notes about the range of the returned values to the code above.
Concerning RandomNormal1(), the situation is pretty clear IMHO.

Regarding RandomNormal2(), my considerations are as follows:

Code: Select all

              x1   is in the open interval ]0, 1[
=>        Log(x1)  is in the interval ]-oo, 0[
=>     -2*Log(x1)  is in the interval ]0, +oo[
=> Sqr(-2*Log(x1)) is in the interval ]0, +oo[         (I)

            x2  is in the open interval ]0, 1[
=>     2*PI*x2  is in the open interval ]0, 2*PI[
=> Cos(2*PI*x2) is in the open interval ]-1, +1[       (II)

=> (I)*(II) is between -oo and +oo,
i.e. the return value of RandomNormal2() can be _any_ real number.
It's just unlikely that we actually will see some extreme values.
RandomNormal2() is more precise and faster than RandomNormal1(), but I mentioned RandomNormal1() because it nicely demonstrates the effect of the central limit theorem.

Thanks for your impressive graphical demonstration.

Regards, Little John

Posted: Thu Jul 03, 2008 12:16 pm
by Little John
Hi again Kaeru,

in your RandomNormal2-Demos above, there is a little glitch.
We cannot display extreme values. That's not actually a problem, but we shouldn't add these values to another category. We'd just ignore them.

So instead of:

Code: Select all

If Z < 0 Or Z > 1024
   Z = 0
EndIf
Field(Z)+1
it should be:

Code: Select all

If Z >= 0 And Z <= 1024
   Field(Z)+1
EndIf
Regards, Little John

PS: Your graphical demonstration really looks cool, we should sell it to stressed out managers. :D

Posted: Thu Jul 03, 2008 1:17 pm
by Little John
Hi Kaeru,

now I've put your code into a procedure, so it's easier to play with different display options:

Code: Select all

EnableExplicit

Procedure.f RandomNormal2 (mean.f=0, stdDev.f=1)
   Protected x1.f, x2.f, r.f

   ; random numbers from the open interval ]0, 1[
   x1 = (Random(999998)+1) / 1000000        ; must be > 0 because of Log(x1)
   x2 = (Random(999998)+1) / 1000000

   r = Sqr(-2*Log(x1)) * Cos(2*#PI*x2)
   ProcedureReturn r*stdDev + mean
EndProcedure

Procedure DrawGaussian (width.l, heigth.l, show.f=4)
   ; The 'show' value denotes how many standard deviations
   ; shall be displayed at both sides of the mean.
   Protected mean.f, stdDev.f
   Protected n.l, z.l
   Dim field(width-1)

   InitSprite()
   InitKeyboard()
   OpenScreen(width, heigth, 32, "RandomNormal2-Demo")

   mean = width/2
   stdDev = mean/show
   Repeat
      ExamineKeyboard()
      For n = 0 To 50
         z = RandomNormal2(mean, stdDev)
         If z >= 0 And z < width
            field(z) + 1
         EndIf
      Next
      StartDrawing(ScreenOutput())
         FrontColor($FF8000)
         For n = 0 To width-1
            Line(n, heigth-1, 0, -field(n))
         Next
      StopDrawing()
      Delay(0)
      FlipBuffers()
   Until KeyboardPushed(#PB_Key_Escape)
EndProcedure

DrawGaussian(1024, 768, 6)
In the example used, the procedure would on principle display values from 6 standard deviations to the left and to the right of the mean. But we see that we'll almost never actually get such extreme values.

Regards, Little John

Posted: Thu Jul 03, 2008 1:53 pm
by Kaeru Gaman
Little John wrote:in your RandomNormal2-Demos above, there is a little glitch.
We cannot display extreme values. That's not actually a problem, but we shouldn't add these values to another category. We'd just ignore them.
in fact, that is not a glitch but just demonstrating the out-of-range values.
sorry, I left this uncommended.
when using the routine in a game or such, we could not just ignore the extremes,
but we have to call the function again.

after all in this case it maybe caused by not knowing the real range of the function.

Posted: Thu Jul 03, 2008 2:47 pm
by Little John
Kaeru Gaman wrote:in fact, that is not a glitch but just demonstrating the out-of-range values.
Ah, I see.

Regards, Little John