Experiments In Ray Tracing 1 (BVH, Multithreading)
Posted: Wed Mar 02, 2016 3:48 am
Hi.
After a break from coding for many years, due to health issues, I feel like getting back into it. And the very first thing that popped into my eye, when I came here, was a thread from Samuel, who wanted to start a community driven ray tracer, but got very little feedback. As a big fan of ray tracing this made me very sad, so I've decided to try to raise the interest in ray tracing by creating something I call Experiments In Ray Tracing, where from time to time I will try to post some code, that will focus on different aspects of ray tracing.
I am still in the process of getting back into the groove, and have a lot of catching up to do. Also English isn't my native language, and by nature, I am a messy coder, so this will be fun.
After a break from coding for many years, due to health issues, I feel like getting back into it. And the very first thing that popped into my eye, when I came here, was a thread from Samuel, who wanted to start a community driven ray tracer, but got very little feedback. As a big fan of ray tracing this made me very sad, so I've decided to try to raise the interest in ray tracing by creating something I call Experiments In Ray Tracing, where from time to time I will try to post some code, that will focus on different aspects of ray tracing.
I am still in the process of getting back into the groove, and have a lot of catching up to do. Also English isn't my native language, and by nature, I am a messy coder, so this will be fun.
Code: Select all
; Experiments In Ray Tracing 1
; lots of spheres, PB only (no assembly, to keep it simple)
; PB 5.41, done on Windows7 64bit
; Hades, 2016
;
; Warning! This is stripped down to the bare minimum,
; to get the length to something I felt comfortable posting,
; so no bells, no whistles, and no safety checks.
CompilerIf #PB_Compiler_Thread = 0
MessageRequester( "Sorry...", "Please, set the compiler to threadsafe before compiling" )
End
CompilerEndIf
EnableExplicit
DisableDebugger ; even without debugger this is slow. As a reference: Default settings (200k spheres at 1024x640): 4-5 fps on a 4 core, 4GHZ Skylake
#LawnSize = 50 ; increasing this over the default of 50 will have little impact on render time, but a big one on initial set up time
; if you've got decent hardware on 64bit (for teh ram!), and are willing to wait a few minutes, try maybe 500. (100x the spheres, about 70% render speed)
#TileSize = 16 ; the picture is rendered in tiles, instead of lines, because keeping rays close together reduces cache misses
#RenderWidth = 1024 ; RenderWidth/Height needs to be a multiple of #TileSize, otherwise you'll crash!
#RenderHeight = 640 ; Window size and render size are independent, but choosing one as the multiple of the other gives better results.
#WindowWidth = 1024 ; BTW... when original Doom came out in 1993, it had a resolution of 320x200, and was restricted to max. 35 FPS.
#WindowHeight = 640 ; I bet you could recreate a doom level using appropriatly colored and sized spheres for each texel, and render it
; at that speed in that resolution. Sure, wouldn't be the most efficient way to do that (lol), but sounds kinda fun :)
#MaxDist = 1000000
Structure stVector
x.f
y.f
z.f
EndStructure
Structure stColor
r.f
g.f
b.f
EndStructure
Structure stMaterial
DiffuseColor.stColor
SpecularIntensity.f
Glossiness.f
EndStructure
Global Dim Material.stMaterial(100)
Structure stLight ; directional light, far away, like the sun
Direction.stVector
EndStructure
Global Light.stLight
Structure stRay
Origin.stVector
Direction.stVector
length.f
PrimitiveHit.i
EndStructure
Structure stSphere
Position.stVector
radius.f
*Material.stMaterial
EndStructure
Global Dim Sphere.stSphere(200)
Global Dim SphereBVH.stSphere(0)
Structure stRenderBuffer
pAddr.i[2] ; I am using 2 render buffers, so while the last threads are still working on the...
Width.i ; ...current frame, the one's that are done can already start with the next one,...
Height.i
EndStructure
Global RenderBuffer.stRenderBuffer
Structure stCamera
Pos.stVector[2] ; ...but that means i need two versions for stuff like this, too.
EndStructure
Global Camera.stCamera
Structure stDisplay
Width.i
Height.i
hdc.i
hrc.i
WindowNr.i
DisplList.i
EndStructure
Global Display.stDisplay
Structure stJob
x.i
y.i
Frame.i
EndStructure
Global NumThreads
Global Dim Job.stJob(0)
Global numJobs
Global JobIdx
Global mutJob
mutJob = CreateMutex()
Structure stAABB ; Axis aligned bounding box
min.f[3]
max.f[3]
EndStructure
Structure stAABoxHit
Near.f
Far.f
EndStructure
Structure stBVH
min.f[3]
max.f[3]
Content.i
numPrim.i
EndStructure ; in 32bit the size is 32 bytes, perfect for cache alignment. Trying to achieve that in 64bit might be interesting, when getting cache misses
Structure stBVHPrimitive
AABB.stAABB
Centre.f[3]
EndStructure
Global Dim BVHPrimitive.stBVHPrimitive(0)
Global Dim BVHPrimIdx(0)
Global FreeNode.i
Structure stMain
Running.i
Frame.i
numMaterials.i
numSpheres.i
*BVH.stBVH
EndStructure
Global Main.stMain
Macro ColorSet(Color, cr, cg, cb)
Color\r = cr
Color\g = cg
Color\b = cb
EndMacro
Macro VectorSet(Vector, VX, VY, VZ)
Vector\x = VX
Vector\y = VY
Vector\z = VZ
EndMacro
Macro Normalize(Vector, rLength)
rLength = 1.0 / Sqr(Vector\x * Vector\x + Vector\y * Vector\y + Vector\z * Vector\z)
Vector\x * rLength
Vector\y * rLength
Vector\z * rLength
EndMacro
Procedure.i RTMaterialCreate(r.f = 1.0, g.f = 1.0, b.f = 1.0)
ColorSet(Material(Main\numMaterials)\DiffuseColor, r, g, b)
Material(Main\numMaterials)\SpecularIntensity = 0.3
Material(Main\numMaterials)\Glossiness = 5
Main\numMaterials + 1
ProcedureReturn @Material(Main\numMaterials - 1)
EndProcedure
Procedure SphereCreate(x.f, y.f, z.f, radius.f, *Material.stMaterial)
If Main\numSpheres >= ArraySize(Sphere())
ReDim Sphere(Main\numSpheres * 2)
EndIf
VectorSet(Sphere(Main\numSpheres)\Position, x, y, z)
Sphere(Main\numSpheres)\radius = radius
Sphere(Main\numSpheres)\Material = *Material
Main\numSpheres + 1
EndProcedure
Procedure DisplayBufferInit(BufferWidth.i, BufferHeight.i, VisibleWidth.i = 0, VisibleHeight.i = 0)
Static Tex.i ; OpenGL allows me to write directly to a buffer
If VisibleWidth = 0 ; from different threads, and then tell the GPU
VisibleWidth = BufferWidth ; to use that buffer as a texture for a screen covering quad.
EndIf ; And the GPU will do that independently from the CPU.
If VisibleHeight = 0 ; Also, it makes render resolution and display resolution independent,
VisibleHeight = BufferHeight ; with no impact on the CPU, witch is a nice feature, too.
EndIf
glShadeModel_(#GL_FLAT)
glEnable_(#GL_TEXTURE_2D)
glBindTexture_(#GL_TEXTURE_2D, @Tex)
glTexParameteri_(#GL_TEXTURE_2D, #GL_TEXTURE_MAG_FILTER, #GL_LINEAR)
glTexParameteri_(#GL_TEXTURE_2D, #GL_TEXTURE_MIN_FILTER, #GL_LINEAR)
glTexParameteri_(#GL_TEXTURE_2D, #GL_TEXTURE_WRAP_S, #GL_CLAMP)
glTexParameteri_(#GL_TEXTURE_2D, #GL_TEXTURE_WRAP_T, #GL_CLAMP)
If Display\DisplList <> 0
glDeleteLists_(Display\DisplList, 1)
EndIf
Display\DisplList = glGenLists_(1)
glNewList_(Display\DisplList, #GL_COMPILE)
glBegin_(#GL_QUADS)
glTexCoord2f_(0.0, 0.0)
glVertex2i_(0,0)
glTexCoord2f_(0.0, 1.0 * VisibleHeight / BufferHeight)
glVertex2i_(0, Display\Height)
glTexCoord2f_(1.0 * VisibleWidth / BufferWidth, 1.0 * VisibleHeight / BufferHeight)
glVertex2i_(Display\Width, Display\Height)
glTexCoord2f_(1.0 * VisibleWidth / BufferWidth, 0.0)
glVertex2i_(Display\Width, 0)
glEnd_()
glEndList_()
EndProcedure
Procedure.i glWindowOpen(Width.i, Height.i, Name.s = "OpenGL Window", flags.i = #PB_Window_SystemMenu | #PB_Window_MinimizeGadget | #PB_Window_ScreenCentered)
Protected hwnd.i, pfd.PIXELFORMATDESCRIPTOR, pixformat.i, wglSwapIntervalEXT.i
Display\WindowNr = OpenWindow(#PB_Any, 0, 0, Width, Height, Name, flags)
If Display\WindowNr
hwnd = WindowID(Display\WindowNr)
Display\hdc = GetDC_(hwnd)
pfd\nSize = SizeOf(PIXELFORMATDESCRIPTOR)
pfd\nVersion = 1
pfd\dwFlags = #PFD_SUPPORT_OPENGL | #PFD_DOUBLEBUFFER | #PFD_DRAW_TO_WINDOW
pfd\iLayerType = #PFD_MAIN_PLANE
pfd\iPixelType = #PFD_TYPE_RGBA
pfd\cColorBits = 32
pfd\cDepthBits = 32
pixformat = ChoosePixelFormat_(Display\hdc, pfd)
SetPixelFormat_(Display\hdc, pixformat, pfd)
Display\hrc = wglCreateContext_(Display\hdc)
wglMakeCurrent_(Display\hdc, Display\hrc)
wglSwapIntervalEXT = wglGetProcAddress_("wglSwapIntervalEXT")
If wglSwapIntervalEXT
CallFunctionFast(wglSwapIntervalEXT, 0)
EndIf
SwapBuffers_(Display\hdc)
Display\Width = Width
Display\Height = Height
glOrtho_(0.0, Display\Width, Display\Height, 0.0, 0.0, 1.0)
ProcedureReturn #True
EndIf
ProcedureReturn #False
EndProcedure
Procedure DisplayBufferShow(pBuffer.i, Width.i, Height.i, pitch.i)
glEnable_(#GL_TEXTURE_2D)
glTexImage2D_(#GL_TEXTURE_2D, 0, #GL_RGBA, Width, Height, 0, #GL_BGRA_EXT, #GL_UNSIGNED_BYTE, pBuffer)
glCallList_(Display\DisplList)
SwapBuffers_(Display\hdc)
EndProcedure
Procedure Intersect_Spheres(*Sphere.stSphere, *Ray.stRay, numSpheres.i) ; Using spheres instead of triangles, because setting up a scene with triangles
Protected i.i, RelPos.stVector, b.f, c.f, d.f, t1.f, t2.f ; is much more work, and I'm kinda lazy :). Also code size would be an issue.
For i = 1 To numSpheres
VectorSet(RelPos, *Ray\Origin\x - *Sphere\Position\x, *Ray\Origin\y - *Sphere\Position\y, *Ray\Origin\z - *Sphere\Position\z)
b = *Ray\Direction\x * RelPos\x + *Ray\Direction\y * RelPos\y + *Ray\Direction\z * RelPos\z
c = (RelPos\x * RelPos\x + RelPos\y * RelPos\y + RelPos\z * RelPos\z) - *Sphere\radius * *Sphere\radius
d = b * b - c
If d > 0.0
d = Sqr(d)
t1 = (-b - d)
If t1 > 0.0 And t1 < *Ray\length ; only testing for hits from the outside of the spheres, to keep it simple/short
*Ray\length = t1
*Ray\PrimitiveHit = *Sphere
EndIf
EndIf
*Sphere + SizeOf(stSphere)
Next i
EndProcedure
Procedure AABoxIntersect(*AABox.stAABB, *Ray.stRay, *rRayDir.stVector, *AABoxHit.stAABoxHit)
Protected n.l, tnear.f, tfar.f, t1.f, t2.f ; this is a very efficient way to compute the intersection of a ray and an
tnear = 0.0 ; Axis Aligned Box, but to give you a glimps of how much is still possible:
t1 = (*AABox\min[0] - *Ray\Origin\x) * *rRayDir\x ; I've got a version I've done around 2008, where I am using a jumptable to
t2 = (*AABox\max[0] - *Ray\Origin\x) * *rRayDir\x ; call 1 of 8 versions of this, depending on the signs of the ray direction,
If t2 < t1 ; so I do know upfront wich side of the AAB is closer, to remove the compare/
Swap t1, t2 ; swap parts, and then check 4 rays at the same time, using branchless SSE code.
EndIf ; And now think about doing several sets of 8 (or soon 16) rays in parallel,
tfar = t2 ; using AVX... And this small procedure has a big impact on the overall speed.
If t1 > tnear ; (On top of that, the other code was even checking the frustum of a big ray packet vs
tnear = t1 ; the box, and do other tricks to avoid doing this intersection test in the first place.
EndIf ; And while that would help here a lot, too, it would make the code to messy to post.)
t1 = (*AABox\min[1] - *Ray\Origin\y) * *rRayDir\y
t2 = (*AABox\max[1] - *Ray\Origin\y) * *rRayDir\y
If t2 < t1
Swap t1, t2
EndIf
If t1 > tnear
tnear = t1
EndIf
If t2 < tfar
tfar = t2
EndIf
t1 = (*AABox\min[2] - *Ray\Origin\z) * *rRayDir\z
t2 = (*AABox\max[2] - *Ray\Origin\z) * *rRayDir\z
If t2 < t1
Swap t1, t2
EndIf
If t1 > tnear
tnear = t1
EndIf
If t2 < tfar
tfar = t2
EndIf
*AABoxHit\Near = tnear
*AABoxHit\Far = tfar
EndProcedure
Procedure BVHTraversal(*Ray.stRay, pTravStack.i)
Protected *StackPtr.integer, *Node.stBVH, rRayDir.stVector, AABoxHit.stAABoxHit, near.f, pNode.i
VectorSet(rRayDir, 1.0 / *Ray\Direction\x, 1.0 / *Ray\Direction\y, 1.0 / *Ray\Direction\z)
*StackPtr = pTravStack
AABoxIntersect(Main\BVH, *Ray, rRayDir, @AABoxHit)
If AABoxHit\far > AABoxHit\near ; the ray is hitting the root node of the BVH
*Node = Main\BVH
Repeat
If *Node\numPrim >= 0 ; this means this is not a leaf node
near = #MaxDist
AABoxIntersect(*Node\Content + SizeOf(stBVH), *Ray, rRayDir, @AABoxHit)
If AABoxHit\far > AABoxHit\near And *Ray\length > AABoxHit\near ; if we have a hit, we need to check what's inside...
*StackPtr + SizeOf(*StackPtr) ; ...so push it onto the stack.
*StackPtr\i = *Node\Content + SizeOf(stBVH) ; (But the big deal here is, if we miss, we have just discarded
near = AABoxHit\near ; about half of the primitives in this node, with a rather
EndIf ; cheap test, and that's where the speed comes from)
AABoxIntersect(*Node\Content, *Ray, rRayDir, @AABoxHit)
If AABoxHit\far > AABoxHit\near And *Ray\length > AABoxHit\near
If near < AABoxHit\near ; making sure to always check the closest node first,
pNode = *StackPtr\i ; so when we get an intersection with a primitive,
*StackPtr\i = *Node\Content ; we don't have to check any nodes that are further away,
*StackPtr + SizeOf(*StackPtr) ; witch makes a huge difference when looking at the scene from the wrong angle
*StackPtr\i = pNode
Else
*StackPtr + SizeOf(*StackPtr)
*StackPtr\i = *Node\Content
EndIf
EndIf
Else ; this is a leaf node containing the actual primitives (spheres in this case),...
Intersect_Spheres(*Node\Content, *Ray, -*Node\numPrim) ; ...so check if the ray does hit something
EndIf
*Node = *StackPtr\i ; using stack based traversal vs recursive should have helped with the speed,
*StackPtr - SizeOf(*StackPtr) ; but strangely didn't. But it made traversing nodes in the right order a bit easier.
Until *StackPtr < pTravStack
EndIf
EndProcedure
Procedure.i Shadow(*HitPosition.stVector, *Normal.stVector, *LightDirection.stVector, LightDistance.f, pTravStack.i)
Protected ShadowRay.stRay
VectorSet(ShadowRay\Origin, *HitPosition\x, *HitPosition\y, *HitPosition\z)
VectorSet(ShadowRay\Direction, *LightDirection\x, *LightDirection\y, *LightDirection\z)
ShadowRay\length = LightDistance
ShadowRay\PrimitiveHit = 0
BVHTraversal(ShadowRay, pTravStack)
If ShadowRay\PrimitiveHit
ProcedureReturn #True
EndIf
ProcedureReturn #False
EndProcedure
Procedure RenderTile(TileX.i, TileY.i, Frame.i, pTravStack.i)
Protected *Buffer.Long, *Sphere.stSphere, *Material.stMaterial, x.i, y.i, i.i, rLength.f, Ray.stRay, Col.stColor
Protected HitPosition.stVector, Normal.stVector, LightNormalAngle.f, Angle.f, ReflectedLightVector.stVector, SpecIntensity.f, Brightness.f
VectorSet(Ray\Origin, Camera\Pos[Frame]\x, Camera\Pos[Frame]\y, Camera\Pos[Frame]\z) ; Create a ray from the Camera...
For y = TileY To TileY + #TileSize - 1
*Buffer = RenderBuffer\pAddr[Frame] + TileX * 4 + y * RenderBuffer\Width * 4
For x = TileX To TileX + #TileSize - 1
VectorSet(Ray\Direction, 0.5 + x - RenderBuffer\Width * 0.5, 0.5 + RenderBuffer\Height * 0.5 - y, RenderBuffer\Width) ;...trough the centre of each pixel
Normalize(Ray\Direction, rLength)
Ray\PrimitiveHit = 0
Ray\length = #MaxDist
BVHTraversal(Ray, pTravStack)
If Ray\PrimitiveHit ; we have hit something
*Sphere = Ray\PrimitiveHit
*Material = *Sphere\Material
VectorSet(HitPosition, Ray\Origin\x + Ray\length * Ray\Direction\x, Ray\Origin\y + Ray\length * Ray\Direction\y, Ray\Origin\z + Ray\length * Ray\Direction\z) ; where was that hit?
VectorSet(Normal, HitPosition\x - *Sphere\Position\x, HitPosition\y - *Sphere\Position\y, HitPosition\z - *Sphere\Position\z) ; what's the normal vector in that spot
Normalize(Normal, rLength) ; making sure the normal vector has a length of 1.0
LightNormalAngle = Normal\x * Light\Direction\x + Normal\y * Light\Direction\y + Normal\z * Light\Direction\z ; the angle between the normal and the incoming light determins how much light the object receives
SpecIntensity = 0.0
If LightNormalAngle > 0.1 And Not Shadow(HitPosition, Normal, Light\Direction, #MaxDist, pTravStack)
If *Material\SpecularIntensity > 0.0
Angle = 2.0 * LightNormalAngle
VectorSet(ReflectedLightVector, Light\Direction\x - Angle * Normal\x, Light\Direction\y - Angle * Normal\y, Light\Direction\z - Angle * Normal\z)
Angle = (Ray\Direction\x * ReflectedLightVector\x + Ray\Direction\y * ReflectedLightVector\y + Ray\Direction\z * ReflectedLightVector\z)
If Angle > 0.0
SpecIntensity = *Material\SpecularIntensity * Pow(Angle, *Material\Glossiness)
EndIf
EndIf
Else
LightNormalAngle = 0.1 ; setting this to 0.1 to avoid completely black shadows
EndIf
*Buffer\l = RGB((*Material\DiffuseColor\b * LightNormalAngle + SpecIntensity) * 255, (*Material\DiffuseColor\g * LightNormalAngle + SpecIntensity) * 255, (*Material\DiffuseColor\r * LightNormalAngle + SpecIntensity) * 255)
Else
Angle = Ray\Direction\y
If Angle < 0.0
Angle = 0.0
EndIf
Angle = 1.0 - Angle
Angle = Angle * Angle
Brightness = 0.7 * Angle * Angle
*Buffer\l = RGB(0.9 * 255, (0.2 + Brightness) * 255, (0.2 + Brightness) * 255)
EndIf
*Buffer + 4
Next
Next
EndProcedure
Procedure WorkerThread(ThreadIdx)
Protected iJob, pTravStack.i
pTravStack = AllocateMemory(Main\numSpheres * 8)
While Main\Running
iJob = -1
LockMutex(mutJob)
If numJobs > JobIdx
iJob = JobIdx
JobIdx + 1
EndIf
UnlockMutex(mutJob)
If iJob > -1
RenderTile(Job(iJob)\x, Job(iJob)\y, Job(iJob)\Frame, pTravStack)
Job(iJob)\Frame = 1 - Job(iJob)\Frame
Else
Delay(0)
EndIf
Wend
EndProcedure
Procedure SetUpJobs(RenderWidth.i, RenderHeight.i)
Protected x, y, i
ReDim Job((RenderHeight * RenderWidth) / (#TileSize * #TileSize))
For y = 0 To RenderHeight - 1 Step #TileSize
For x = 0 To RenderWidth - 1 Step #TileSize
Job(i)\x = x
Job(i)\y = y
i + 1
Next x
Next y
numJobs = i
JobIdx = i
EndProcedure
Procedure CreateThreads()
Protected i
NumThreads = CountCPUs(#PB_System_ProcessCPUs)
For i = 0 To NumThreads - 1
CreateThread(@WorkerThread(), i)
Next
EndProcedure
; The idea behind the SAH algorithm is to split a node into the two bounding boxes with the lowest Surface Area, because it increases the chance of the ray missing, finding empty spots. And empty spots don't need any tests :)
Procedure.f ComputeSAH(pNode.l, rArea.f, left.l, right.l, Axis.l, SplitPos.f)
Protected n.l, i.l, AABB1.stAABB, AABB2.stAABB, ns1.l, ns2.l, *Prim.stBVHPrimitive
Protected s1.stVector, s2.stVector, A1.f, A2.f, SAH.f
AABB1\min[0] = #MaxDist
AABB1\min[1] = #MaxDist
AABB1\min[2] = #MaxDist
AABB1\max[0] = -#MaxDist
AABB1\max[1] = -#MaxDist
AABB1\max[2] = -#MaxDist
AABB2\min[0] = #MaxDist
AABB2\min[1] = #MaxDist
AABB2\min[2] = #MaxDist
AABB2\max[0] = -#MaxDist
AABB2\max[1] = -#MaxDist
AABB2\max[2] = -#MaxDist
For n = left To right ; fit the bounding box around the primitives. For calculating the SAH value only.
*Prim = @BVHPrimitive(BVHPrimIdx(n))
If *Prim\Centre[Axis] < SplitPos
ns1 + 1
For i = 0 To 2
If *Prim\AABB\min[i] < AABB1\min[i]
AABB1\min[i] = *Prim\AABB\min[i]
EndIf
If *Prim\AABB\max[i] > AABB1\max[i]
AABB1\max[i] = *Prim\AABB\max[i]
EndIf
Next
Else
ns2 + 1
For i = 0 To 2
If *Prim\AABB\min[i] < AABB2\min[i]
AABB2\min[i] = *Prim\AABB\min[i]
EndIf
If *Prim\AABB\max[i] > AABB2\max[i]
AABB2\max[i] = *Prim\AABB\max[i]
EndIf
Next
EndIf
Next
; compute the SAH value...
If ns1 = 0 ; ...based on the number of primitives in the nodes...
ns1 = 1000 ; (no primitives in the node is bad, so set some bad values to avoid that case)
A1 = 1000000000.0
Else
s1\x = AABB1\max[0] - AABB1\min[0] ; ...the surface area of nodes...
s1\y = AABB1\max[1] - AABB1\min[1]
s1\z = AABB1\max[2] - AABB1\min[2]
A1 = s1\x * s1\y + s1\x * s1\z + s1\z * s1\y
EndIf
If ns2 = 0
ns2 = 1000
A2 = 1000000000.0
Else
s2\x = AABB2\max[0] - AABB2\min[0]
s2\y = AABB2\max[1] - AABB2\min[1]
s2\z = AABB2\max[2] - AABB2\min[2]
A2 = s2\x * s2\y + s2\x * s2\z + s2\z * s2\y
EndIf
SAH = (A1 * ns1 + A2 * ns2) * rArea ; ...and the area of the parent node
ProcedureReturn SAH
EndProcedure
Procedure BVHSplit(pNode, left, right, depth, *SphBVHIdx.integer, UseSAH.i = #True)
Protected *Node.stBVH, Axis, AxL.f, Centre.f, n, i
Protected *Prim.stBVHPrimitive, PivotIdx, ok, *lNode.stBVH, *rNode.stBVH, Split.f, s.stVector, rArea.f, SAH.f, SAH1.f
*Node = pNode
; if there are primitives in this node...
If right - left > 0 ; right is the last primitive, left the first in this node. The naming might seem strange, but is a result of the spatial sorting performed here
AxL = *Node\max[0] - *Node\min[0] ; ...find the longest axis of this node...
If *Node\max[1] - *Node\min[1] > AxL
Axis = 1
AxL = *Node\max[1] - *Node\min[1]
EndIf
If *Node\max[2] - *Node\min[2] > AxL
Axis = 2
AxL = *Node\max[2] - *Node\min[2]
EndIf
Centre = (*Node\min[Axis] + *Node\max[Axis]) * 0.5 ; ... and do a centre split
If UseSAH ; using the SAH will significantly improve render times in most cases, but will dramatically slow down BVH creation
s\x = *Node\max[0] - *Node\min[0]
s\y = *Node\max[1] - *Node\min[1]
s\z = *Node\max[2] - *Node\min[2]
rArea = 1.0 / (s\x * s\y + s\x * s\z + s\z * s\y)
SAH = ComputeSAH(pNode, rArea, left, right, Axis, Centre)
If (right - left) > 30 ; (value might need adjusting) if there is a significant amount of primitives in this node...
For i = 0 To 2 ; ...along each Axis..
For n = 1 To 9 ; ...make 9 splits...
Split = *Node\min[i] + ((*Node\max[i] - *Node\min[i]) * 0.1 * n)
SAH1 = ComputeSAH(pNode, rArea, left, right, i, Split) ; ...and check for the best SAH value
If SAH1 < SAH
SAH = SAH1
Axis = i
Centre = Split
EndIf
Next
Next
Else
For i = 0 To 2 ; ...otherwise do a complete sweep on all axis...
For n = left To right ; ...for all primitives
*Prim = @BVHPrimitive(BVHPrimIdx(n))
Split = *Prim\Centre[i]
SAH1 = ComputeSAH(pNode, rArea, left, right, i, Split)
If SAH1 < SAH
SAH = SAH1
Axis = i
Centre = Split
EndIf
Next
Next
EndIf
EndIf
PivotIdx = left ; ... and sort the primitives into left/right of the split
For n = left To right
*Prim = @BVHPrimitive(BVHPrimIdx(n))
If *Prim\Centre[Axis] < Centre
Swap BVHPrimIdx(n), BVHPrimIdx(PivotIdx)
PivotIdx + 1
EndIf
Next
If PivotIdx > left And PivotIdx <= right; If the primitives don't all end up on one side (should only happen if there is a serious problem with the geometry)...
; ...and if there are more than a few primitives left...
If right - left > 8 Or SAH < ((right - left) + 1) * 0.2 ; (these numbers might need changing when the code changes, or to deal with a wider set of scenes)
*lNode = FreeNode ; ...create the two new nodes...
*rNode = *lNode + SizeOf(stBVH)
FreeNode = FreeNode + 2 * SizeOf(stBVH)
*Node\Content = *lNode
*lNode\min[0] = #MaxDist
*lNode\min[1] = #MaxDist
*lNode\min[2] = #MaxDist
*lNode\max[0] = -#MaxDist
*lNode\max[1] = -#MaxDist
*lNode\max[2] = -#MaxDist
For n = left To PivotIdx - 1 ; fit the bounding box of the node around the primitives in the node
*Prim = @BVHPrimitive(BVHPrimIdx(n))
For i = 0 To 2
If *Prim\AABB\min[i] < *lNode\min[i]
*lNode\min[i] = *Prim\AABB\min[i]
EndIf
If *Prim\AABB\max[i] > *lNode\max[i]
*lNode\max[i] = *Prim\AABB\max[i]
EndIf
Next
Next
*rNode\min[0] = #MaxDist
*rNode\min[1] = #MaxDist
*rNode\min[2] = #MaxDist
*rNode\max[0] = -#MaxDist
*rNode\max[1] = -#MaxDist
*rNode\max[2] = -#MaxDist
For n = PivotIdx To right
*Prim = @BVHPrimitive(BVHPrimIdx(n))
For i = 0 To 2
If *Prim\AABB\min[i] < *rNode\min[i]
*rNode\min[i] = *Prim\AABB\min[i]
EndIf
If *Prim\AABB\max[i] > *rNode\max[i]
*rNode\max[i] = *Prim\AABB\max[i]
EndIf
Next
Next
*lNode\numPrim = PivotIdx - left
*rNode\numPrim = (right - PivotIdx) + 1
BVHSplit(*lNode, left, PivotIdx - 1, Depth + 1, *SphBVHIdx, UseSAH) ; ...and recursivly test them for further splits
BVHSplit(*rNode, PivotIdx, right, Depth + 1, *SphBVHIdx, UseSAH)
ProcedureReturn
EndIf
EndIf
EndIf
; ...otherwise create a leaf (a node without children, but a pointer to the first primitive)
*Node\numPrim = -((right - left) + 1) ; using a nagative number here to mark this as a leaf. Otherwise another variable would be needed. Too much memory consumption already
*Node\Content = SphereBVH(*SphBVHIdx\i)
For n = left To right - 1 ; Here, I am reordering the spheres, so when a ray hits a leaf node,
SphereBVH(*SphBVHIdx\i) = Sphere(BVHPrimIdx(n)) ; all the needed data is in in one spot, and in the right order.
*SphBVHIdx\i + 1 ; This helps the CPU to prefech the data and avoid cache misses.
Next ; Didn't do squad for the speed on my system though.
SphereBVH(*SphBVHIdx\i) = Sphere(BVHPrimIdx(n)) ; Guess when sticking to PB only, speed is too slow for memory access to become a problem.
*SphBVHIdx\i + 1
EndProcedure
; This is where it's at: BVH (Bounding Volume Hierarchy), the main reason for me to post this code.
; Without an acceleration structure, like BVH, you are forever doomed to stick to tiny scenes, when doing raytracing,
; but with it, rendering 10's of millions of primitives becomes possible at a decent speed, even in realtime, to some extent.
; This BVH code is just something I've quickly thrown together back around 2008 and never really touched again. As a result,
; the time it takes to create the BVH is terrible, and it's quality (the render speed it allows) probably isn't as good as it could be.
; OTOH it works and is rather simple, easy to understand, compared to a more advanced version, so posting this might be the better choice.
Procedure BVHCreate()
Protected BVH, n, i, *BVH.stBVH, SphBVHIdx.i
ReDim BVHPrimitive(Main\numSpheres - 1)
ReDim BVHPrimIdx(Main\numSpheres - 1)
ReDim SphereBVH(Main\numSpheres - 1)
For n = 0 To Main\numSpheres - 1 ; precompute a bounding box around every primitive, to ease the creation of the BVH
BVHPrimitive(n)\AABB\min[0] = Sphere(n)\Position\x - Sphere(n)\radius
BVHPrimitive(n)\AABB\min[1] = Sphere(n)\Position\y - Sphere(n)\radius
BVHPrimitive(n)\AABB\min[2] = Sphere(n)\Position\z - Sphere(n)\radius
BVHPrimitive(n)\AABB\max[0] = Sphere(n)\Position\x + Sphere(n)\radius
BVHPrimitive(n)\AABB\max[1] = Sphere(n)\Position\y + Sphere(n)\radius
BVHPrimitive(n)\AABB\max[2] = Sphere(n)\Position\z + Sphere(n)\radius
BVHPrimitive(n)\Centre[0] = Sphere(n)\Position\x ; for a triangle one would use the centre of the bounding box
BVHPrimitive(n)\Centre[1] = Sphere(n)\Position\y
BVHPrimitive(n)\Centre[2] = Sphere(n)\Position\z
Next n
BVH = AllocateMemory(2 * Main\numSpheres * SizeOf(stBVH)) ; this is the maximum size the BVH could have
*BVH = BVH
*BVH\min[0] = #MaxDist
*BVH\min[1] = #MaxDist
*BVH\min[2] = #MaxDist
*BVH\max[0] = -#MaxDist
*BVH\max[1] = -#MaxDist
*BVH\max[2] = -#MaxDist
For n = 0 To Main\numSpheres - 1
BVHPrimIdx(n) = n ; we need to sort the primitives into the BVH nodes. This index array will be used for that purpose
For i = 0 To 2 ; Fit the root nodes AABB tightly around the scene
If BVHPrimitive(n)\AABB\min[i] < *BVH\min[i]
*BVH\min[i] = BVHPrimitive(n)\AABB\min[i]
EndIf
If BVHPrimitive(n)\AABB\max[i] > *BVH\max[i]
*BVH\max[i] = BVHPrimitive(n)\AABB\max[i]
EndIf
Next
Next
*BVH\numPrim = Main\numSpheres
*BVH\Content = *BVH + SizeOf(stBVH)
FreeNode = *BVH + SizeOf(stBVH)
BVHSplit(*BVH, 0, Main\numSpheres - 1, 0, @SphBVHIdx)
ProcedureReturn BVH
EndProcedure
Procedure SceneDirtNGras(*MatDirt.stMaterial, *MatGras.stMaterial, sx.f, sy.f, sz.f)
Protected n.i, i.i, Scale.f, BendX.f, BendZ.f, BX.f, BZ.f, x.f, y.f, z.f
SphereCreate(sx, sy, sz, 10, *MatDirt)
For n = 1 To Random(6, 3)
Scale = 0.0008 * Random(1000, 500)
x = sx + 0.01 * (Random(1000) - 500)
y = sy + 9.0
z = sz + 0.01 * (Random(1000) - 500)
BendX = 0.00002 * (Random(1000) - 500)
BendZ = 0.00002 * (Random(1000) - 500)
BX = 0.0
BZ = 0.0
For i = 1 To 20
SphereCreate(x, y, z, Scale, *MatGras)
x = x + BX
z = z + BZ
y = y + Scale
BX = BX + BendX
BZ = BZ + BendZ
Scale = Scale * 0.96
Next i
Next n
EndProcedure
Procedure SceneCreate()
Protected *Material.stMaterial, *MatDirt.stMaterial, *MatGras.stMaterial, x, y, z, rLength.f
*MatDirt = RTMaterialCreate(0.4, 0.3, 0.2)
*MatDirt\SpecularIntensity = 0.0
*MatGras = RTMaterialCreate(0.0, 0.6, 0.2)
For x = 1 To #LawnSize
For z = 1 To #LawnSize
SceneDirtNGras(*MatDirt, *MatGras, x * 10 - #LawnSize * 5, -40, z * 10 + 100)
Next z
Next x
VectorSet(Light\Direction, 1, 1, -1)
Normalize(Light\Direction, rLength)
Main\BVH = BVHCreate()
EndProcedure
Procedure Init()
glWindowOpen(#WindowWidth, #WindowHeight, "Experiments In Raytracing 1")
RenderBuffer\pAddr[0] = AllocateMemory(#RenderWidth * #RenderHeight * 4)
RenderBuffer\pAddr[1] = AllocateMemory(#RenderWidth * #RenderHeight * 4)
RenderBuffer\Width = #RenderWidth
RenderBuffer\Height = #RenderHeight
DisplayBufferInit(#RenderWidth, #RenderHeight)
SceneCreate()
SetUpJobs(#RenderWidth, #RenderHeight)
Main\Running = #True
CreateThreads()
EndProcedure
Procedure Main()
Protected time.i, AvTime.f, NewAvTime.f, Event.i, Frame.i
Init()
While Main\Running
If JobIdx >= numJobs
DisplayBufferShow(RenderBuffer\pAddr[Main\Frame], RenderBuffer\Width, RenderBuffer\Height, RenderBuffer\Width * 4)
JobIdx = 0
Camera\Pos[Main\Frame]\z = Camera\Pos[1 - Main\Frame]\z + 0.5
If Camera\Pos[Main\Frame]\z > #LawnSize * 10
Camera\Pos[Main\Frame]\z = 0.0
EndIf
Main\Frame = 1 - Main\Frame
NewAvTime = ElapsedMilliseconds() - time
If Abs(NewAvTime - AvTime) > NewAvTime * 0.2
AvTime = NewAvTime
Else
AvTime = AvTime * 0.9 + NewAvTime * 0.1
EndIf
time = ElapsedMilliseconds()
SetWindowTitle(Display\WindowNr, "Experiments In Raytracing 1 / " + Str(Main\numSpheres) + " Spheres / Frame time: " + Str(AvTime) + "ms")
EndIf
Delay(0)
Event = WindowEvent()
If Event = #PB_Event_CloseWindow
Main\Running = #False
EndIf
Wend
Delay(10) ; Wait for threads to finish
EndProcedure
Main()