Experiments In Ray Tracing 1 (BVH, Multithreading)

Everything related to 3D programming
User avatar
Hades
Enthusiast
Enthusiast
Posts: 188
Joined: Tue May 17, 2005 8:39 pm

Experiments In Ray Tracing 1 (BVH, Multithreading)

Post by Hades »

Hi.

After a break from coding for many years, due to health issues, I feel like getting back into it. And the very first thing that popped into my eye, when I came here, was a thread from Samuel, who wanted to start a community driven ray tracer, but got very little feedback. As a big fan of ray tracing this made me very sad, so I've decided to try to raise the interest in ray tracing by creating something I call Experiments In Ray Tracing, where from time to time I will try to post some code, that will focus on different aspects of ray tracing.
I am still in the process of getting back into the groove, and have a lot of catching up to do. Also English isn't my native language, and by nature, I am a messy coder, so this will be fun. :D

Code: Select all

; Experiments In Ray Tracing 1
; lots of spheres, PB only (no assembly, to keep it simple)
; PB 5.41, done on Windows7 64bit
; Hades, 2016
;
; Warning! This is stripped down to the bare minimum,
; to get the length to something I felt comfortable posting,
; so no bells, no whistles, and no safety checks.

CompilerIf #PB_Compiler_Thread = 0
  MessageRequester( "Sorry...", "Please, set the compiler to threadsafe before compiling" )
  End
CompilerEndIf

EnableExplicit
DisableDebugger ; even without debugger this is slow. As a reference: Default settings (200k spheres at 1024x640): 4-5 fps on a 4 core, 4GHZ Skylake 

#LawnSize = 50 ; increasing this over the default of 50 will have little impact on render time, but a big one on initial set up time
               ; if you've got decent hardware on 64bit (for teh ram!), and are willing to wait a few minutes, try maybe 500. (100x the spheres, about 70% render speed)
#TileSize = 16 ; the picture is rendered in tiles, instead of lines, because keeping rays close together reduces cache misses

#RenderWidth = 1024 ; RenderWidth/Height needs to be a multiple of #TileSize, otherwise you'll crash!
#RenderHeight = 640 ; Window size and render size are independent, but choosing one as the multiple of the other gives better results.
#WindowWidth = 1024   ; BTW... when original Doom came out in 1993, it had a resolution of 320x200, and was restricted to max. 35 FPS.
#WindowHeight = 640   ; I bet you could recreate a doom level using appropriatly colored and sized spheres for each texel, and render it
                      ; at that speed in that resolution. Sure, wouldn't be the most efficient way to do that (lol), but sounds kinda fun :)
#MaxDist = 1000000
  
Structure stVector
  x.f
  y.f
  z.f
EndStructure

Structure stColor
  r.f
  g.f
  b.f
EndStructure

Structure stMaterial
  DiffuseColor.stColor
  SpecularIntensity.f
  Glossiness.f
EndStructure
Global Dim Material.stMaterial(100)

Structure stLight   ; directional light, far away, like the sun
  Direction.stVector
EndStructure
Global Light.stLight

Structure stRay
  Origin.stVector
  Direction.stVector
  length.f
  PrimitiveHit.i
EndStructure

Structure stSphere
  Position.stVector
  radius.f
  *Material.stMaterial
EndStructure
Global Dim Sphere.stSphere(200)
Global Dim SphereBVH.stSphere(0)

Structure stRenderBuffer
  pAddr.i[2]  ; I am using 2 render buffers, so while the last threads are still working on the...
  Width.i     ; ...current frame, the one's that are done can already start with the next one,...
  Height.i
EndStructure
Global RenderBuffer.stRenderBuffer

Structure stCamera
  Pos.stVector[2] ; ...but that means i need two versions for stuff like this, too.
EndStructure
Global Camera.stCamera

Structure stDisplay
  Width.i
  Height.i
  hdc.i
  hrc.i
  WindowNr.i
  DisplList.i
EndStructure
Global Display.stDisplay

Structure stJob
  x.i
  y.i
  Frame.i
EndStructure
Global NumThreads
Global Dim Job.stJob(0)
Global numJobs
Global JobIdx
Global mutJob
mutJob = CreateMutex()

Structure stAABB ; Axis aligned bounding box
  min.f[3]
  max.f[3]
EndStructure

Structure stAABoxHit
  Near.f
  Far.f
EndStructure  

Structure stBVH
  min.f[3]
  max.f[3]
  Content.i
  numPrim.i
EndStructure ; in 32bit the size is 32 bytes, perfect for cache alignment. Trying to achieve that in 64bit might be interesting, when getting cache misses

Structure stBVHPrimitive
  AABB.stAABB
  Centre.f[3]
EndStructure  
Global Dim BVHPrimitive.stBVHPrimitive(0)
Global Dim BVHPrimIdx(0)
Global FreeNode.i


Structure stMain
  Running.i
  Frame.i
  numMaterials.i
  numSpheres.i
  *BVH.stBVH
EndStructure
Global Main.stMain


Macro ColorSet(Color, cr, cg, cb)
  Color\r = cr
  Color\g = cg
  Color\b = cb
EndMacro

Macro VectorSet(Vector, VX, VY, VZ)
  Vector\x = VX
  Vector\y = VY
  Vector\z = VZ
EndMacro

Macro Normalize(Vector, rLength)
  rLength =  1.0 / Sqr(Vector\x * Vector\x + Vector\y * Vector\y + Vector\z * Vector\z)
  Vector\x * rLength
  Vector\y * rLength
  Vector\z * rLength
EndMacro  


Procedure.i RTMaterialCreate(r.f = 1.0, g.f = 1.0, b.f = 1.0)
  ColorSet(Material(Main\numMaterials)\DiffuseColor, r, g, b)
  Material(Main\numMaterials)\SpecularIntensity = 0.3
  Material(Main\numMaterials)\Glossiness = 5
  Main\numMaterials + 1
  ProcedureReturn @Material(Main\numMaterials - 1)
EndProcedure

Procedure SphereCreate(x.f, y.f, z.f, radius.f, *Material.stMaterial)
  If Main\numSpheres >= ArraySize(Sphere())
    ReDim Sphere(Main\numSpheres * 2)
  EndIf
  VectorSet(Sphere(Main\numSpheres)\Position, x, y, z)
  Sphere(Main\numSpheres)\radius = radius
  Sphere(Main\numSpheres)\Material = *Material
  Main\numSpheres + 1
EndProcedure

Procedure DisplayBufferInit(BufferWidth.i, BufferHeight.i, VisibleWidth.i = 0, VisibleHeight.i = 0)
  Static Tex.i                                                            ; OpenGL allows me to write directly to a buffer
  If VisibleWidth = 0                                                     ; from different threads, and then tell the GPU
    VisibleWidth = BufferWidth                                            ; to use that buffer as a texture for a screen covering quad.
  EndIf                                                                   ; And the GPU will do that independently from the CPU.
  If VisibleHeight = 0                                                    ; Also, it makes render resolution and display resolution independent,
    VisibleHeight = BufferHeight                                          ; with no impact on the CPU, witch is a nice feature, too.
  EndIf
  glShadeModel_(#GL_FLAT)
  glEnable_(#GL_TEXTURE_2D)
  glBindTexture_(#GL_TEXTURE_2D, @Tex)
  glTexParameteri_(#GL_TEXTURE_2D, #GL_TEXTURE_MAG_FILTER, #GL_LINEAR)
  glTexParameteri_(#GL_TEXTURE_2D, #GL_TEXTURE_MIN_FILTER, #GL_LINEAR)
  glTexParameteri_(#GL_TEXTURE_2D, #GL_TEXTURE_WRAP_S, #GL_CLAMP)
  glTexParameteri_(#GL_TEXTURE_2D, #GL_TEXTURE_WRAP_T, #GL_CLAMP)
  
  If Display\DisplList <> 0
    glDeleteLists_(Display\DisplList, 1)
  EndIf
  Display\DisplList = glGenLists_(1)
  glNewList_(Display\DisplList, #GL_COMPILE)
  glBegin_(#GL_QUADS)
  glTexCoord2f_(0.0, 0.0)
  glVertex2i_(0,0)
  glTexCoord2f_(0.0, 1.0 * VisibleHeight / BufferHeight)
  glVertex2i_(0, Display\Height)
  glTexCoord2f_(1.0 * VisibleWidth / BufferWidth, 1.0 * VisibleHeight / BufferHeight)
  glVertex2i_(Display\Width, Display\Height)
  glTexCoord2f_(1.0 * VisibleWidth / BufferWidth, 0.0)
  glVertex2i_(Display\Width, 0)
  glEnd_()
  glEndList_()
  
EndProcedure

Procedure.i glWindowOpen(Width.i, Height.i, Name.s = "OpenGL Window", flags.i = #PB_Window_SystemMenu | #PB_Window_MinimizeGadget | #PB_Window_ScreenCentered)
  Protected hwnd.i, pfd.PIXELFORMATDESCRIPTOR, pixformat.i, wglSwapIntervalEXT.i
  Display\WindowNr = OpenWindow(#PB_Any, 0, 0, Width, Height, Name, flags)
  If Display\WindowNr
    hwnd = WindowID(Display\WindowNr)
    
    Display\hdc = GetDC_(hwnd)
    pfd\nSize        = SizeOf(PIXELFORMATDESCRIPTOR)
    pfd\nVersion     = 1
    pfd\dwFlags      = #PFD_SUPPORT_OPENGL | #PFD_DOUBLEBUFFER | #PFD_DRAW_TO_WINDOW
    pfd\iLayerType   = #PFD_MAIN_PLANE
    pfd\iPixelType   = #PFD_TYPE_RGBA
    pfd\cColorBits   = 32
    pfd\cDepthBits   = 32
    pixformat = ChoosePixelFormat_(Display\hdc, pfd)
    SetPixelFormat_(Display\hdc, pixformat, pfd)
    Display\hrc = wglCreateContext_(Display\hdc)
    wglMakeCurrent_(Display\hdc, Display\hrc)
    
    wglSwapIntervalEXT = wglGetProcAddress_("wglSwapIntervalEXT") 
    If wglSwapIntervalEXT
      CallFunctionFast(wglSwapIntervalEXT, 0) 
    EndIf  
    SwapBuffers_(Display\hdc)
    Display\Width = Width
    Display\Height = Height
    
    glOrtho_(0.0, Display\Width, Display\Height, 0.0, 0.0, 1.0)
    ProcedureReturn #True
  EndIf
  
  ProcedureReturn #False
EndProcedure

Procedure DisplayBufferShow(pBuffer.i, Width.i, Height.i, pitch.i)
  glEnable_(#GL_TEXTURE_2D)
  glTexImage2D_(#GL_TEXTURE_2D, 0, #GL_RGBA, Width, Height, 0, #GL_BGRA_EXT, #GL_UNSIGNED_BYTE, pBuffer)
  glCallList_(Display\DisplList)
  SwapBuffers_(Display\hdc)
EndProcedure

Procedure Intersect_Spheres(*Sphere.stSphere, *Ray.stRay, numSpheres.i) ; Using spheres instead of triangles, because setting up a scene with triangles
  Protected i.i, RelPos.stVector, b.f, c.f, d.f, t1.f, t2.f             ; is much more work, and I'm kinda lazy :). Also code size would be an issue.
  For i = 1 To numSpheres
    VectorSet(RelPos, *Ray\Origin\x - *Sphere\Position\x, *Ray\Origin\y - *Sphere\Position\y, *Ray\Origin\z - *Sphere\Position\z)
    b = *Ray\Direction\x * RelPos\x + *Ray\Direction\y * RelPos\y + *Ray\Direction\z * RelPos\z
    c = (RelPos\x * RelPos\x + RelPos\y * RelPos\y + RelPos\z * RelPos\z) - *Sphere\radius * *Sphere\radius
    d = b * b - c
    If d > 0.0
      d = Sqr(d)
      t1 = (-b - d)
      If t1 > 0.0 And t1 < *Ray\length    ; only testing for hits from the outside of the spheres, to keep it simple/short
        *Ray\length = t1                  
        *Ray\PrimitiveHit = *Sphere
      EndIf
    EndIf
    *Sphere + SizeOf(stSphere)
  Next i
EndProcedure

Procedure AABoxIntersect(*AABox.stAABB, *Ray.stRay, *rRayDir.stVector, *AABoxHit.stAABoxHit)
  Protected n.l, tnear.f, tfar.f, t1.f, t2.f          ; this is a very efficient way to compute the intersection of a ray and an
  tnear = 0.0                                         ; Axis Aligned Box, but to give you a glimps of how much is still possible:
  t1 = (*AABox\min[0] - *Ray\Origin\x) * *rRayDir\x   ; I've got a version I've done around 2008, where I am using a jumptable to
  t2 = (*AABox\max[0] - *Ray\Origin\x) * *rRayDir\x   ; call 1 of 8 versions of this, depending on the signs of the ray direction,
  If t2 < t1                                          ; so I do know upfront wich side of the AAB is closer, to remove the compare/
    Swap t1, t2                                       ; swap parts, and then check 4 rays at the same time, using branchless SSE code.
  EndIf                                               ; And now think about doing several sets of 8 (or soon 16) rays in parallel,
  tfar = t2                                           ; using AVX... And this small procedure has a big impact on the overall speed.
  If t1 > tnear                                       ; (On top of that, the other code was even checking the frustum of a big ray packet vs
    tnear = t1                                        ; the box, and do other tricks to avoid doing this intersection test in the first place.
  EndIf                                               ; And while that would help here a lot, too, it would make the code to messy to post.) 
    
  t1 = (*AABox\min[1] - *Ray\Origin\y) * *rRayDir\y
  t2 = (*AABox\max[1] - *Ray\Origin\y) * *rRayDir\y
  If t2 < t1
    Swap t1, t2
  EndIf
  If t1 > tnear
    tnear = t1
  EndIf  
  If t2 < tfar
    tfar = t2
  EndIf  
    
  t1 = (*AABox\min[2] - *Ray\Origin\z) * *rRayDir\z
  t2 = (*AABox\max[2] - *Ray\Origin\z) * *rRayDir\z
  If t2 < t1
    Swap t1, t2
  EndIf
  If t1 > tnear
    tnear = t1
  EndIf  
  If t2 < tfar
    tfar = t2
  EndIf  
    
  *AABoxHit\Near = tnear
  *AABoxHit\Far = tfar
EndProcedure

Procedure BVHTraversal(*Ray.stRay, pTravStack.i)
  Protected *StackPtr.integer, *Node.stBVH, rRayDir.stVector, AABoxHit.stAABoxHit, near.f, pNode.i
  VectorSet(rRayDir, 1.0 / *Ray\Direction\x, 1.0 / *Ray\Direction\y, 1.0 / *Ray\Direction\z)
  *StackPtr = pTravStack
  AABoxIntersect(Main\BVH, *Ray, rRayDir, @AABoxHit)
  If AABoxHit\far > AABoxHit\near ; the ray is hitting the root node of the BVH
    *Node = Main\BVH
    Repeat
      If *Node\numPrim >= 0 ; this means this is not a leaf node 
        near = #MaxDist
        AABoxIntersect(*Node\Content + SizeOf(stBVH), *Ray, rRayDir, @AABoxHit)
        If AABoxHit\far > AABoxHit\near And *Ray\length > AABoxHit\near ; if we have a hit, we need to check what's inside...
          *StackPtr + SizeOf(*StackPtr)                                 ; ...so push it onto the stack.
          *StackPtr\i = *Node\Content + SizeOf(stBVH)                   ; (But the big deal here is, if we miss, we have just discarded
          near = AABoxHit\near                                          ; about half of the primitives in this node, with a rather
        EndIf                                                           ; cheap test, and that's where the speed comes from)
        AABoxIntersect(*Node\Content, *Ray, rRayDir, @AABoxHit)
        If AABoxHit\far > AABoxHit\near And *Ray\length > AABoxHit\near
          If near < AABoxHit\near           ; making sure to always check the closest node first,
            pNode = *StackPtr\i             ; so when we get an intersection with a primitive,
            *StackPtr\i = *Node\Content     ; we don't have to check any nodes that are further away,
            *StackPtr + SizeOf(*StackPtr)   ; witch makes a huge difference when looking at the scene from the wrong angle
            *StackPtr\i = pNode
          Else  
            *StackPtr + SizeOf(*StackPtr)
            *StackPtr\i = *Node\Content
          EndIf  
        EndIf
      Else ; this is a leaf node containing the actual primitives (spheres in this case),...
        Intersect_Spheres(*Node\Content, *Ray, -*Node\numPrim)  ; ...so check if the ray does hit something
      EndIf
      *Node = *StackPtr\i           ; using stack based traversal vs recursive should have helped with the speed,
      *StackPtr - SizeOf(*StackPtr) ; but strangely didn't. But it made traversing nodes in the right order a bit easier.
    Until *StackPtr < pTravStack
  EndIf
EndProcedure

Procedure.i Shadow(*HitPosition.stVector, *Normal.stVector, *LightDirection.stVector, LightDistance.f, pTravStack.i)
  Protected ShadowRay.stRay
  VectorSet(ShadowRay\Origin, *HitPosition\x, *HitPosition\y, *HitPosition\z)
  VectorSet(ShadowRay\Direction, *LightDirection\x, *LightDirection\y, *LightDirection\z)
  ShadowRay\length = LightDistance
  ShadowRay\PrimitiveHit = 0
  
  BVHTraversal(ShadowRay, pTravStack)
  If ShadowRay\PrimitiveHit
    ProcedureReturn #True
  EndIf
  ProcedureReturn #False
EndProcedure

Procedure RenderTile(TileX.i, TileY.i, Frame.i, pTravStack.i)
  Protected *Buffer.Long, *Sphere.stSphere, *Material.stMaterial, x.i, y.i, i.i, rLength.f, Ray.stRay, Col.stColor
  Protected HitPosition.stVector, Normal.stVector, LightNormalAngle.f, Angle.f, ReflectedLightVector.stVector, SpecIntensity.f, Brightness.f
  VectorSet(Ray\Origin, Camera\Pos[Frame]\x, Camera\Pos[Frame]\y, Camera\Pos[Frame]\z) ; Create a ray from the Camera...
  For y = TileY To TileY + #TileSize - 1 
    *Buffer = RenderBuffer\pAddr[Frame] + TileX * 4 + y * RenderBuffer\Width * 4
    For x = TileX To TileX + #TileSize - 1
      VectorSet(Ray\Direction, 0.5 + x - RenderBuffer\Width * 0.5, 0.5 + RenderBuffer\Height * 0.5 - y, RenderBuffer\Width) ;...trough the centre of each pixel
      Normalize(Ray\Direction, rLength)
      Ray\PrimitiveHit = 0
      Ray\length = #MaxDist
      BVHTraversal(Ray, pTravStack)
      If Ray\PrimitiveHit ; we have hit something
        *Sphere = Ray\PrimitiveHit
        *Material = *Sphere\Material
        VectorSet(HitPosition, Ray\Origin\x + Ray\length * Ray\Direction\x, Ray\Origin\y + Ray\length * Ray\Direction\y, Ray\Origin\z + Ray\length * Ray\Direction\z) ; where was that hit?
        VectorSet(Normal, HitPosition\x - *Sphere\Position\x, HitPosition\y - *Sphere\Position\y, HitPosition\z - *Sphere\Position\z) ; what's the normal vector in that spot
        Normalize(Normal, rLength) ; making sure the normal vector has a length of 1.0 
        LightNormalAngle = Normal\x * Light\Direction\x + Normal\y * Light\Direction\y + Normal\z * Light\Direction\z ; the angle between the normal and the incoming light determins how much light the object receives
        SpecIntensity = 0.0
        If LightNormalAngle > 0.1 And Not Shadow(HitPosition, Normal, Light\Direction, #MaxDist, pTravStack)
          If *Material\SpecularIntensity > 0.0
            Angle = 2.0 * LightNormalAngle
            VectorSet(ReflectedLightVector, Light\Direction\x - Angle * Normal\x, Light\Direction\y - Angle * Normal\y, Light\Direction\z - Angle * Normal\z)
            Angle = (Ray\Direction\x * ReflectedLightVector\x + Ray\Direction\y * ReflectedLightVector\y + Ray\Direction\z * ReflectedLightVector\z)
            If Angle > 0.0
              SpecIntensity = *Material\SpecularIntensity * Pow(Angle, *Material\Glossiness)
            EndIf
          EndIf
        Else
          LightNormalAngle = 0.1 ; setting this to 0.1 to avoid completely black shadows
        EndIf          
        *Buffer\l = RGB((*Material\DiffuseColor\b * LightNormalAngle + SpecIntensity) * 255, (*Material\DiffuseColor\g * LightNormalAngle + SpecIntensity) * 255, (*Material\DiffuseColor\r * LightNormalAngle + SpecIntensity) * 255)        
      Else
        Angle = Ray\Direction\y
        If Angle < 0.0
          Angle = 0.0
        EndIf
        Angle = 1.0 - Angle
        Angle = Angle * Angle
        Brightness = 0.7 * Angle * Angle
        *Buffer\l = RGB(0.9 * 255, (0.2 + Brightness) * 255, (0.2 + Brightness) * 255)
      EndIf  
      *Buffer + 4
    Next
  Next
EndProcedure  

Procedure WorkerThread(ThreadIdx)
  Protected iJob, pTravStack.i
  pTravStack = AllocateMemory(Main\numSpheres * 8)
  While Main\Running
    iJob = -1
    LockMutex(mutJob)
    If numJobs > JobIdx
      iJob = JobIdx
      JobIdx + 1
    EndIf  
    UnlockMutex(mutJob)
    If iJob > -1
      RenderTile(Job(iJob)\x, Job(iJob)\y, Job(iJob)\Frame, pTravStack)
      Job(iJob)\Frame = 1 - Job(iJob)\Frame
    Else  
      Delay(0)
    EndIf  
  Wend
EndProcedure

Procedure SetUpJobs(RenderWidth.i, RenderHeight.i)
  Protected x, y, i
  ReDim Job((RenderHeight * RenderWidth) / (#TileSize * #TileSize))
  For y = 0 To RenderHeight - 1 Step #TileSize
    For x = 0 To RenderWidth - 1 Step #TileSize
      Job(i)\x = x
      Job(i)\y = y
      i + 1
    Next x  
  Next y  
  numJobs = i
  JobIdx = i
EndProcedure

Procedure CreateThreads()
  Protected i
  NumThreads = CountCPUs(#PB_System_ProcessCPUs)
  For i = 0 To NumThreads - 1
    CreateThread(@WorkerThread(), i)
  Next
EndProcedure

; The idea behind the SAH algorithm is to split a node into the two bounding boxes with the lowest Surface Area, because it increases the chance of the ray missing, finding empty spots. And empty spots don't need any tests :)
Procedure.f ComputeSAH(pNode.l, rArea.f, left.l, right.l, Axis.l, SplitPos.f)
  Protected n.l, i.l, AABB1.stAABB, AABB2.stAABB, ns1.l, ns2.l, *Prim.stBVHPrimitive
  Protected s1.stVector, s2.stVector, A1.f, A2.f, SAH.f
  AABB1\min[0] = #MaxDist
  AABB1\min[1] = #MaxDist
  AABB1\min[2] = #MaxDist
  AABB1\max[0] = -#MaxDist
  AABB1\max[1] = -#MaxDist
  AABB1\max[2] = -#MaxDist
  AABB2\min[0] = #MaxDist
  AABB2\min[1] = #MaxDist
  AABB2\min[2] = #MaxDist
  AABB2\max[0] = -#MaxDist
  AABB2\max[1] = -#MaxDist
  AABB2\max[2] = -#MaxDist
  For n = left To right ; fit the bounding box around the primitives. For calculating the SAH value only.
    *Prim = @BVHPrimitive(BVHPrimIdx(n))
    If *Prim\Centre[Axis] < SplitPos
      ns1 + 1
      For i = 0 To 2
        If *Prim\AABB\min[i] < AABB1\min[i]
          AABB1\min[i] = *Prim\AABB\min[i]
        EndIf
        If *Prim\AABB\max[i] > AABB1\max[i]
          AABB1\max[i] = *Prim\AABB\max[i]
        EndIf
      Next
    Else
      ns2 + 1
      For i = 0 To 2
        If *Prim\AABB\min[i] < AABB2\min[i]
          AABB2\min[i] = *Prim\AABB\min[i]
        EndIf
        If *Prim\AABB\max[i] > AABB2\max[i]
          AABB2\max[i] = *Prim\AABB\max[i]
        EndIf
      Next
    EndIf
  Next
  ; compute the SAH value...
  If ns1 = 0 ;  ...based on the number of primitives in the nodes...
    ns1 = 1000        ; (no primitives in the node is bad, so set some bad values to avoid that case)
    A1 = 1000000000.0
  Else  
    s1\x = AABB1\max[0] - AABB1\min[0]  ; ...the surface area of nodes...
    s1\y = AABB1\max[1] - AABB1\min[1]
    s1\z = AABB1\max[2] - AABB1\min[2]
    A1 = s1\x * s1\y + s1\x * s1\z + s1\z * s1\y
  EndIf
  If ns2 = 0
    ns2 = 1000
    A2 = 1000000000.0
  Else  
    s2\x = AABB2\max[0] - AABB2\min[0]
    s2\y = AABB2\max[1] - AABB2\min[1]
    s2\z = AABB2\max[2] - AABB2\min[2]
    A2 = s2\x * s2\y + s2\x * s2\z + s2\z * s2\y
  EndIf
  SAH = (A1 * ns1 + A2 * ns2) * rArea ; ...and the area of the parent node
  ProcedureReturn SAH
EndProcedure

Procedure BVHSplit(pNode, left, right, depth, *SphBVHIdx.integer, UseSAH.i = #True)
  Protected *Node.stBVH, Axis, AxL.f, Centre.f, n, i
  Protected *Prim.stBVHPrimitive, PivotIdx, ok, *lNode.stBVH, *rNode.stBVH, Split.f, s.stVector, rArea.f, SAH.f, SAH1.f
  *Node = pNode
  ; if there are primitives in this node...
  If right - left > 0 ; right is the last primitive, left the first in this node. The naming might seem strange, but is a result of the spatial sorting performed here
    AxL = *Node\max[0] - *Node\min[0]      ; ...find the longest axis of this node...
    If *Node\max[1] - *Node\min[1] > AxL
      Axis = 1
      AxL = *Node\max[1] - *Node\min[1]
    EndIf
    If *Node\max[2] - *Node\min[2] > AxL
      Axis = 2
      AxL = *Node\max[2] - *Node\min[2]
    EndIf
    Centre = (*Node\min[Axis] + *Node\max[Axis]) * 0.5 ; ... and do a centre split
    
    If UseSAH ; using the SAH will significantly improve render times in most cases, but will dramatically slow down BVH creation 
      s\x = *Node\max[0] - *Node\min[0]
      s\y = *Node\max[1] - *Node\min[1]
      s\z = *Node\max[2] - *Node\min[2]
      rArea = 1.0 / (s\x * s\y + s\x * s\z + s\z * s\y)
      SAH = ComputeSAH(pNode, rArea, left, right, Axis, Centre)
      If (right - left) > 30 ; (value might need adjusting)                     if there is a significant amount of primitives in this node... 
        For i = 0 To 2                                                        ; ...along each Axis..
          For n = 1 To 9                                                      ; ...make 9 splits...
            Split = *Node\min[i] + ((*Node\max[i] - *Node\min[i]) * 0.1 * n)
            SAH1 = ComputeSAH(pNode, rArea, left, right, i, Split)            ; ...and check for the best SAH value

            If SAH1 < SAH
              SAH = SAH1
              Axis = i
              Centre = Split
            EndIf
          Next
        Next  
      Else
        For i = 0 To 2                                                        ; ...otherwise do a complete sweep on all axis...
          For n = left To right                                               ; ...for all primitives 
            *Prim = @BVHPrimitive(BVHPrimIdx(n))
            Split = *Prim\Centre[i]
            SAH1 = ComputeSAH(pNode, rArea, left, right, i, Split)
            If SAH1 < SAH
              SAH = SAH1
              Axis = i
              Centre = Split
            EndIf
          Next
        Next
      EndIf
    EndIf 
    
    PivotIdx = left                                                           ; ... and sort the primitives into left/right of the split
    For n = left To right
      *Prim = @BVHPrimitive(BVHPrimIdx(n))
      If *Prim\Centre[Axis] < Centre
        Swap BVHPrimIdx(n), BVHPrimIdx(PivotIdx)
        PivotIdx + 1
      EndIf
    Next
    If PivotIdx > left And PivotIdx <= right; If the primitives don't all end up on one side (should only happen if there is a serious problem with the geometry)...
      ; ...and if there are more than a few primitives left...
      If right - left > 8 Or SAH < ((right - left) + 1) * 0.2 ; (these numbers might need changing when the code changes, or to deal with a wider set of scenes)
        *lNode = FreeNode                         ; ...create the two new nodes...
        *rNode = *lNode + SizeOf(stBVH)
        FreeNode = FreeNode + 2 * SizeOf(stBVH)
        *Node\Content = *lNode
        *lNode\min[0] = #MaxDist
        *lNode\min[1] = #MaxDist
        *lNode\min[2] = #MaxDist
        *lNode\max[0] = -#MaxDist
        *lNode\max[1] = -#MaxDist
        *lNode\max[2] = -#MaxDist
        For n = left To PivotIdx - 1  ; fit the bounding box of the node around the primitives in the node
          *Prim = @BVHPrimitive(BVHPrimIdx(n))
          For i = 0 To 2
            If *Prim\AABB\min[i] < *lNode\min[i]
              *lNode\min[i] = *Prim\AABB\min[i]
            EndIf
            If *Prim\AABB\max[i] > *lNode\max[i]
              *lNode\max[i] = *Prim\AABB\max[i]
            EndIf
          Next
        Next
        *rNode\min[0] = #MaxDist
        *rNode\min[1] = #MaxDist
        *rNode\min[2] = #MaxDist
        *rNode\max[0] = -#MaxDist
        *rNode\max[1] = -#MaxDist
        *rNode\max[2] = -#MaxDist
        For n = PivotIdx To right
          *Prim = @BVHPrimitive(BVHPrimIdx(n))
          For i = 0 To 2
            If *Prim\AABB\min[i] < *rNode\min[i]
              *rNode\min[i] = *Prim\AABB\min[i]
            EndIf
            If *Prim\AABB\max[i] > *rNode\max[i]
              *rNode\max[i] = *Prim\AABB\max[i]
            EndIf
          Next
        Next
        *lNode\numPrim = PivotIdx - left
        *rNode\numPrim = (right - PivotIdx) + 1   
        BVHSplit(*lNode, left, PivotIdx - 1, Depth + 1, *SphBVHIdx, UseSAH) ; ...and recursivly test them for further splits
        BVHSplit(*rNode, PivotIdx, right, Depth + 1, *SphBVHIdx, UseSAH)
        ProcedureReturn
      EndIf
    EndIf
  EndIf
  ; ...otherwise create a leaf (a node without children, but a pointer to the first primitive)
  *Node\numPrim = -((right - left) + 1) ; using a nagative number here to mark this as a leaf. Otherwise another variable would be needed. Too much memory consumption already
  *Node\Content = SphereBVH(*SphBVHIdx\i)
  For n = left To right - 1                         ; Here, I am reordering the spheres, so when a ray hits a leaf node, 
    SphereBVH(*SphBVHIdx\i) = Sphere(BVHPrimIdx(n)) ; all the needed data is in in one spot, and in the right order. 
    *SphBVHIdx\i + 1                                ; This helps the CPU to prefech the data and avoid cache misses.
  Next                                              ; Didn't do squad for the speed on my system though.                                        
  SphereBVH(*SphBVHIdx\i) = Sphere(BVHPrimIdx(n))   ; Guess when sticking to PB only, speed is too slow for memory access to become a problem.
  *SphBVHIdx\i + 1
EndProcedure  

; This is where it's at: BVH (Bounding Volume Hierarchy), the main reason for me to post this code.
; Without an acceleration structure, like BVH, you are forever doomed to stick to tiny scenes, when doing raytracing,
; but with it, rendering 10's of millions of primitives becomes possible at a decent speed, even in realtime, to some extent.
; This BVH code is just something I've quickly thrown together back around 2008 and never really touched again. As a result,
; the time it takes to create the BVH is terrible, and it's quality (the render speed it allows) probably isn't as good as it could be.
; OTOH it works and is rather simple, easy to understand, compared to a more advanced version, so posting this might be the better choice.
Procedure BVHCreate()
  Protected BVH, n, i, *BVH.stBVH, SphBVHIdx.i
  ReDim BVHPrimitive(Main\numSpheres - 1)
  ReDim BVHPrimIdx(Main\numSpheres - 1)
  ReDim SphereBVH(Main\numSpheres - 1)
  For n = 0 To Main\numSpheres - 1  ; precompute a bounding box around every primitive, to ease the creation of the BVH
    BVHPrimitive(n)\AABB\min[0] = Sphere(n)\Position\x - Sphere(n)\radius
    BVHPrimitive(n)\AABB\min[1] = Sphere(n)\Position\y - Sphere(n)\radius
    BVHPrimitive(n)\AABB\min[2] = Sphere(n)\Position\z - Sphere(n)\radius
    BVHPrimitive(n)\AABB\max[0] = Sphere(n)\Position\x + Sphere(n)\radius
    BVHPrimitive(n)\AABB\max[1] = Sphere(n)\Position\y + Sphere(n)\radius
    BVHPrimitive(n)\AABB\max[2] = Sphere(n)\Position\z + Sphere(n)\radius    
    BVHPrimitive(n)\Centre[0] = Sphere(n)\Position\x  ; for a triangle one would use the centre of the bounding box
    BVHPrimitive(n)\Centre[1] = Sphere(n)\Position\y
    BVHPrimitive(n)\Centre[2] = Sphere(n)\Position\z
  Next n
  BVH = AllocateMemory(2 * Main\numSpheres * SizeOf(stBVH)) ; this is the maximum size the BVH could have
  *BVH = BVH
  *BVH\min[0] = #MaxDist
  *BVH\min[1] = #MaxDist
  *BVH\min[2] = #MaxDist
  *BVH\max[0] = -#MaxDist
  *BVH\max[1] = -#MaxDist
  *BVH\max[2] = -#MaxDist
  For n = 0 To Main\numSpheres - 1
    BVHPrimIdx(n) = n ; we need to sort the primitives into the BVH nodes. This index array will be used for that purpose
    For i = 0 To 2  ; Fit the root nodes AABB tightly around the scene
      If BVHPrimitive(n)\AABB\min[i] < *BVH\min[i]
        *BVH\min[i] = BVHPrimitive(n)\AABB\min[i]
      EndIf
      If BVHPrimitive(n)\AABB\max[i] > *BVH\max[i]
        *BVH\max[i] = BVHPrimitive(n)\AABB\max[i]
      EndIf
    Next
  Next
  *BVH\numPrim = Main\numSpheres
  *BVH\Content = *BVH + SizeOf(stBVH)
  FreeNode = *BVH + SizeOf(stBVH)
  BVHSplit(*BVH, 0, Main\numSpheres - 1, 0, @SphBVHIdx)
  ProcedureReturn BVH
EndProcedure  

Procedure SceneDirtNGras(*MatDirt.stMaterial, *MatGras.stMaterial, sx.f, sy.f, sz.f)
  Protected n.i, i.i, Scale.f, BendX.f, BendZ.f, BX.f, BZ.f, x.f, y.f, z.f
  SphereCreate(sx, sy, sz, 10, *MatDirt)
  For n = 1 To Random(6, 3)
    Scale = 0.0008 * Random(1000, 500)
    x = sx + 0.01 * (Random(1000) - 500)
    y = sy + 9.0
    z = sz + 0.01 * (Random(1000) - 500)
    BendX = 0.00002 * (Random(1000) - 500)
    BendZ = 0.00002 * (Random(1000) - 500)
    BX = 0.0
    BZ = 0.0
    For i = 1 To 20
      SphereCreate(x, y, z, Scale, *MatGras)
      x = x + BX
      z = z + BZ
      y = y + Scale
      BX = BX + BendX
      BZ = BZ + BendZ
      Scale = Scale * 0.96
    Next i 
  Next n
EndProcedure

Procedure SceneCreate()
  Protected *Material.stMaterial, *MatDirt.stMaterial, *MatGras.stMaterial, x, y, z, rLength.f
  *MatDirt = RTMaterialCreate(0.4, 0.3, 0.2)
  *MatDirt\SpecularIntensity = 0.0
  *MatGras = RTMaterialCreate(0.0, 0.6, 0.2)
  For x = 1 To #LawnSize
    For z = 1 To #LawnSize
      SceneDirtNGras(*MatDirt, *MatGras, x * 10 - #LawnSize * 5, -40, z * 10 + 100)
    Next z
  Next x  
  VectorSet(Light\Direction, 1, 1, -1)
  Normalize(Light\Direction, rLength)
  Main\BVH = BVHCreate()
EndProcedure

Procedure Init()
  glWindowOpen(#WindowWidth, #WindowHeight, "Experiments In Raytracing 1")
  RenderBuffer\pAddr[0] = AllocateMemory(#RenderWidth * #RenderHeight * 4)
  RenderBuffer\pAddr[1] = AllocateMemory(#RenderWidth * #RenderHeight * 4)
  RenderBuffer\Width = #RenderWidth
  RenderBuffer\Height = #RenderHeight
  DisplayBufferInit(#RenderWidth, #RenderHeight)
  SceneCreate()
  SetUpJobs(#RenderWidth, #RenderHeight)
  Main\Running = #True
  CreateThreads()
EndProcedure

Procedure Main()
  Protected time.i, AvTime.f, NewAvTime.f, Event.i, Frame.i
  Init()
  While Main\Running
    If JobIdx >= numJobs
      DisplayBufferShow(RenderBuffer\pAddr[Main\Frame], RenderBuffer\Width, RenderBuffer\Height, RenderBuffer\Width * 4)
      JobIdx = 0
      Camera\Pos[Main\Frame]\z = Camera\Pos[1 - Main\Frame]\z + 0.5
      If Camera\Pos[Main\Frame]\z > #LawnSize * 10
        Camera\Pos[Main\Frame]\z = 0.0
      EndIf  
      Main\Frame = 1 - Main\Frame
      NewAvTime = ElapsedMilliseconds() - time
      If Abs(NewAvTime - AvTime) > NewAvTime * 0.2
        AvTime = NewAvTime
      Else  
        AvTime = AvTime * 0.9 + NewAvTime * 0.1
      EndIf  
      time = ElapsedMilliseconds() 
      SetWindowTitle(Display\WindowNr, "Experiments In Raytracing 1 / " + Str(Main\numSpheres) + " Spheres / Frame time: " + Str(AvTime) + "ms")
    EndIf  
    Delay(0)
    Event = WindowEvent()
    If Event = #PB_Event_CloseWindow
      Main\Running = #False
    EndIf  
  Wend 
  Delay(10) ; Wait for threads to finish
EndProcedure

Main()
Last edited by Hades on Thu Mar 03, 2016 9:12 am, edited 2 times in total.
User avatar
djes
Addict
Addict
Posts: 1806
Joined: Sat Feb 19, 2005 2:46 pm
Location: Pas-de-Calais, France

Re: Experiments In Ray Tracing 1

Post by djes »

Yeah ! Great code ! You'll always have at least an admirator, me ;)
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5524
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Experiments In Ray Tracing 1

Post by Kwai chang caine »

With me ... the counter is at 2 :mrgreen:
Splendid, but unfortunately my laptop is not also quick, surely also my video card ... :|
So it's when even giant :shock:
Thanks for sharing 8)
ImageThe happiness is a road...
Not a destination
User avatar
bbanelli
Enthusiast
Enthusiast
Posts: 544
Joined: Tue May 28, 2013 10:51 pm
Location: Europe
Contact:

Re: Experiments In Ray Tracing 1

Post by bbanelli »

@Hades

Perhaps you should include CompilerIf for MT support as mandatory. In addition, you can add in comments that running it on 32bit compiler makes it impossible to use with LawnSize of 500 due to memory consumption.

Would you mind clarifying few things?

http://imgur.com/ZNuEaN8

a) why doesn't GPU handles any load?
b) since you are using gl_ functions, is there any difference when using DirectX (default) subsystem or is it better to switch to OpenGL?
"If you lie to the compiler, it will get its revenge."
Henry Spencer
https://www.pci-z.com/
User avatar
Hades
Enthusiast
Enthusiast
Posts: 188
Joined: Tue May 17, 2005 8:39 pm

Re: Experiments In Ray Tracing 1

Post by Hades »

Thanks for nice words, guys. And nice to see some familiar names are still around, too. :D

@bbanelli

Many thanks for those suggestions, I actually suck at everything but ray tracing, so keep 'em coming. :oops:

a: The only thing the GPU does is rendering a single rectangle that fills the screen, using the picture rendered by the CPU as a texture. Even the worst integrated one would handle that without any effort.
b: I've assumed, by directly accessing OpenGL it wouldn't make any difference, but I'm actually not 100% sure.

Edit: Changed the title for better future reference.

Also, if you want to see how much difference a little bit of assembly in the right spot can make, you can swap in this procedure:
BTW, if your not impressed with the result, please read what I wrote next to the PB version. This Assembly version still only intersects
one ray with one box, like the PB version, non of the other tricks are included.

Code: Select all

Procedure AABoxIntersect(*AABox.stAABB, *Ray.stRay, *rRayDir.stVector, *AABoxHit.stAABoxHit)
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !rdx equ edx
    !rax equ eax
    !rcx equ ecx
  CompilerEndIf
  !MOV rdx, [p.p_AABox]
  !MOV rax, [p.p_rRayDir]
  !MOV rcx, [p.p_Ray]
  !xorps xmm6, xmm6
  !movups xmm0, [rdx]
  !movups xmm1, [rdx + 12]
  !movups xmm2, [rcx]
  !subps xmm0, xmm2
  !subps xmm1, xmm2
  !movups xmm3, [rax]
  !mulps xmm0, xmm3
  !mulps xmm1, xmm3
  !movaps xmm4, xmm0
  !movaps xmm7, xmm1
  !minps xmm4, xmm1
  !maxps xmm7, xmm0
  !maxps xmm6, xmm4
  !movaps xmm0, xmm6
  !shufps xmm0, xmm6, 01000110b
  !maxps xmm6, xmm0
  !movaps xmm2, xmm6
  !shufps xmm6, xmm6, 10110001b
  !maxps xmm6, xmm2
  !movaps xmm3, xmm7
  !shufps xmm3, xmm7, 01000110b
  !minps xmm7, xmm3
  !movaps xmm5, xmm7
  !shufps xmm7, xmm7, 10110001b
  !minps xmm7, xmm5
  !mov rdx, [p.p_AABoxHit]
  !movss [rdx], xmm6
  !movss [rdx + 4], xmm7
EndProcedure
Post Reply