ParallelFor : parallel coding for PB 4.41+

Share your advanced PureBasic knowledge/code with the community.
User avatar
eddy
Addict
Addict
Posts: 1479
Joined: Mon May 26, 2003 3:07 pm
Location: Nantes

ParallelFor : parallel coding for PB 4.41+

Post by eddy »

It's an updated version of Freak's ParallelFor() for PB 4.41+ : http://www.purebasic.fr/english/viewtop ... 12&t=30873
- supported : win / linux ( because there's no yet a Mac specific code for the following function GetCPUcount() )
- tested on a dual core machine.
- I replaced the API semaphores by the native PB semaphores ( I hope I didn't made any mistake during the conversion)
- Remark: you have to enable the "ThreadSafe" option to test this code. (if not, something goes wrong with PB semaphore.)

In my example, I declared a global image but it's not necessary.
You can send the image handle by using a custom data pointer.

Code: Select all

EnableExplicit
Prototype ParallelLoopFunc(Value, *dataPtr)

Global ParallelCPUCount
Global ParallelStart
Global ParallelStarted
Global ParallelRunning
Global ParallelValue
Global ParallelLoop.ParallelLoopFunc
Global *ParallelCustomData

Procedure GetCPUCount()
   CompilerSelect #PB_Compiler_OS
      CompilerCase #PB_OS_Windows
         Protected Count, i, ProcessMask, SystemMask
         
         If GetProcessAffinityMask_(GetCurrentProcess_(), @ProcessMask, @SystemMask)
            For i=0 To 31
               If ProcessMask & (1<<i)
                  Count+1
               EndIf
            Next
         EndIf
         
         If Count=0
            ProcedureReturn 1
         Else
            ProcedureReturn Count
         EndIf
      CompilerCase #PB_OS_Linux
         ;
         ; Code by remi_meier
         ;
         Protected.i file, count, dir
         Protected.s s
         
         If FileSize("/proc/cpuinfo")<>-1
            file=ReadFile(#PB_Any, "/proc/cpuinfo")
            
            If IsFile(file)
               count=0
               While Not Eof(file)
                  s.s=ReadString(file)
                  If Left(s, 9)="processor"
                     count+1
                  EndIf
               Wend
               
               CloseFile(file)
            EndIf
            
         Else
            dir=ExamineDirectory(#PB_Any, "/proc/acpi/processor", "")
            count=0
            If IsDirectory(dir)
               While NextDirectoryEntry(dir)
                  If Left(DirectoryEntryName(dir), 3)="CPU"
                     count+1
                  EndIf
               Wend
               FinishDirectory(dir)
            EndIf
         EndIf
         
         ProcedureReturn count
   CompilerEndSelect
EndProcedure
Procedure ParallelWorkerThread(dummy)
   Protected Loop.ParallelLoopFunc, Value, *dataPtr
   
   Repeat
      WaitSemaphore(ParallelStart)     ; wait for "weakup" - semaphore
      WaitSemaphore(ParallelRunning)   ; this one should be immediately available (allows to later know when the work is complete)
      *dataPtr=*ParallelCustomData
      Value=ParallelValue              ; copy the values locally
      Loop=ParallelLoop
      SignalSemaphore(ParallelStarted) ; signal the "startup-complete" semaphore so the next thread can be started
      Loop(Value, *dataPtr)            ; execute loop
      SignalSemaphore(ParallelRunning) ; signal completion of the job
   ForEver
EndProcedure
Procedure ParallelFor(i1, i2, Loop.ParallelLoopFunc, *dataPtr=0)
   Protected i
   
   *ParallelCustomData=*dataPtr
   ParallelLoop=Loop
   
   ; release one thread at a time so they can access the "CurrentValue"
   ; only the thread startup is serialized. the Loop() is run in parallel
   ;
   For ParallelValue=i1 To i2
      SignalSemaphore(ParallelStart)
      WaitSemaphore(ParallelStarted)
   Next
   
   ; wait for all worker threads to complete their work
   ;
   For i=1 To ParallelCPUCount
      WaitSemaphore(ParallelRunning)
   Next
   
   ; reset the running count semaphore back to the start value
   ;
   For i=1 To ParallelCPUCount
      SignalSemaphore(ParallelRunning);, ProcessorCount)
   Next
EndProcedure
Procedure InitParallel()
   ParallelCPUCount=GetCPUCount()
   ParallelStart=CreateSemaphore(0);max=1
   ParallelStarted=CreateSemaphore(0);max=1
   ParallelRunning=CreateSemaphore(ParallelCPUCount);max= ProcessorCount
   
   Define i
   For i=1 To ParallelCPUCount
      CreateThread(@ParallelWorkerThread(), 0)
   Next
   
   CompilerIf #PB_Compiler_Thread=0
      MessageRequester("Init Parallel", "Initialization failed because your executable is not compiled in ThreadSafe mode.")
      End
   CompilerEndIf
EndProcedure
DisableExplicit

; ********************
; Example
; ********************

InitParallel()

Global maxX=1024
Global maxY=1024
Global img=CreateImage(#PB_Any, maxX, maxY, 32)

Macro DrawingCode()
   ;I want to test this code 
   Define xx,yy,zz
   xx=x%256
   xx/#PI
   xx=Pow(Sin(xx)*$F,7.3) 
   yy=y%512
   yy/#PI
   yy=Pow(Cos(xx/3+yy)*$F,xx)
   zz=(xx ! yy) & $FF 
   Plot(x, y, RGB(xx, yy, zz))
EndMacro

;{// Linear Coding
Delay(100)
startTime=ElapsedMilliseconds()
StartDrawing(ImageOutput(img))
   Define x,y
   For y=0 To maxY-1
      Define x
      For x=0 To maxX-1
         DrawingCode()
      Next
   Next
StopDrawing()
time1=ElapsedMilliseconds()-startTime
;}

;{// Parallel Coding
Procedure DrawingLoop(y, maxX)
   ; access has to be synchronized
   StartDrawing(ImageOutput(img))
      Define x
      For x=0 To maxX-1
         DrawingCode()
      Next
   StopDrawing()
EndProcedure
Delay(100)
startTime=ElapsedMilliseconds()
ParallelFor(0, maxY-1, @DrawingLoop(), maxX)
time2=ElapsedMilliseconds()-startTime
;}

MessageRequester("Terminated ", "Time: "+Str(time1)+#LF$+"Time (optimized, CPU count="+Str(GetCPUCount())+"): "+Str(time2)+#LF$)

OpenWindow(0, 0, 0,800,600, "Parallel Coding", #PB_Window_ScreenCentered | #PB_Window_SystemMenu)
ScrollAreaGadget(1, 0, 0, 800,600, maxX, maxY)
ImageGadget(0, 0, 0, 0, 0, ImageID(img))
CloseGadgetList()

Repeat : Until WaitWindowEvent()=#PB_Event_CloseWindow
Imagewin10 x64 5.72 | IDE | PB plugin | Tools | Sprite | JSON | visual tool
Mistrel
Addict
Addict
Posts: 3415
Joined: Sat Jun 30, 2007 8:04 pm

Re: ParallelFor : parallel coding for PB 4.41+

Post by Mistrel »

For Windows programming I've found that InterlockedCompareExchange() is a lot faster than mutexes or semaphores.

I assume it's because this compiles down to a single assembly instruction.
Post Reply