RayTracer Theory and Practice

Everything else that doesn't fall into one of the other PB categories.
deadmoap
User
User
Posts: 79
Joined: Sun Feb 22, 2004 11:45 pm
Location: Riverdale, Utah
Contact:

Post by deadmoap »

I don't know too much about raytracing... but damn there's some awesome code in here. Although it took my computer 16 seconds to render that one pic... damn slow ass thing.

Anyway, is raytracing done in the same way as other... uhhh... "3d rendering stuff"? Like OpenGL or Direct3D? If it is, then why are raytracing programs way slower? Is it because OpenGL and Direct3D graphics are rendered with the graphics card?
User avatar
Hades
Enthusiast
Enthusiast
Posts: 188
Joined: Tue May 17, 2005 8:39 pm

Post by Hades »

No, it's completly different.

Today's graphics cards are rasterizers. The vertices of triangles are projected onto the screen, textures are interpolated over it and light is calculated per vertex and interpolated over the triangle. With shaders you may have even per pixel lighting and bumpmapping. But perfect shadows and reflections are damn hard to do. Refraction is near to impossible for complexe scenes.

Raytraycing works the other way around. From a virtual eye through every pixel in the screen a ray is calculated and testet which object is hit by the ray first.
-From that intersection point a ray is calculated to every light source and is checked if it intersects an object before reaching the light source. If not, the first object is lit by that lightsource. -> perfect shadows.
-If the object is reflective another ray is calculated that is mirrored by that object, and again intersected with that scene. -> perfect reflections
-If the object is refractive a refracted ray is calculated. -> perfect refractions

You can intersect that ray with every object you can define mathematical, so you get perfect spheres, tubes, cones, bezier patches, ... Not that jagged crap from rasterizers. :wink:

Any combination of effects is automaticly handled by that system. No tricks and hacks nessasary as with rasterizers.
But all that calculation done in software has it's price. It's much slower than rasterizing.
But that will change.
What you see here isn't exactly the fastest code possible, and raytraycing slows down mostly with resolution, much less with number of objects. That's opposite to rasterizers. And because the number of objects in 3d graphics increases faster than screen resolution, in a not to far future raytracing will be the faster method to render 3d graphics.

The fastest raytracers today are faster than rasterizers for complexe scenes of about 1 million objects.
And look at this:
http://graphics.cs.uni-sb.de/MassiveRT/
This approach makes it possible to interactively render a complete "Boeing 777" model containing more than 350 million individual triangles without any simplifications at several frames per second even on a single PC.
Don't try this at home (with a rasterizer). :D

In 5 years you will play your first raytraced shooter. In 10 years rasterizers are gone ( hopefully ). :D
User avatar
Hades
Enthusiast
Enthusiast
Posts: 188
Joined: Tue May 17, 2005 8:39 pm

Post by Hades »

@Dreglor

I checked your lighting code. You do funny things there. :D

I have changed it, it's much faster now, but does the same.

Code: Select all

          If LightList()\IsLight=#True
            Light\x=LightList()\Origin\x-Intersection\x
            Light\y=LightList()\Origin\y-Intersection\y
            Light\z=LightList()\Origin\z-Intersection\z
            
            VectorNormalize(Light)
            
            ;this shadow test only works with point Light
            shade.f=1
            
            *Old_Element3=@ObjectList()
            
            If ObjectList()\UseTexture=#True ;will be removed from Trimesh only version because its faster
              ScalarU.f=GetFPTWidth(ObjectList()\Material\Texture)*ObjectList()\Material\UScale
              ScalarV.f=GetFPTHieght(ObjectList()\Material\Texture)*ObjectList()\Material\VScale
              GetFPTColor(ObjectList()\Material\Texture,Closest\TextureCoords\u*ScalarU,Closest\TextureCoords\v*ScalarV,PrimaryColor)
            Else
              PrimaryColor\Red=ObjectList()\Material\SoildColor\Red
              PrimaryColor\Green=ObjectList()\Material\SoildColor\Green
              PrimaryColor\Blue=ObjectList()\Material\SoildColor\Blue
            EndIf
            
            ForEach ObjectList()
              If @ObjectList()<>*Old_Element3 ;disables self shadowing because it cause werid problems...
                If ObjectList()\IsLight=#False
                  CallFunctionFast(ObjectList()\IntersectionMethod,Intersection,Light,@ObjectList(),IntersectionResults)
                  If IntersectionResults\result=#Hit Or IntersectionResults\result=#InPrimitive
                    shade=ObjectList()\Material\Refract
                    Break
                  EndIf
                EndIf
              EndIf
            Next ;ObjectList()
I have also changed that SSE procedure. If you use it in the above code, you'll get even more speed.

Code: Select all

Procedure.b VectorNormalizeSSE(*this.xyz) ; SSE Version 
  !MOV  dword Eax,[Esp] 
  !movups xmm0, [Eax] 
  !movaps xmm1, xmm0 
  !mulps xmm0, xmm1
  !movhlps xmm2, xmm0
  !movaps xmm3, xmm0
  !shufps xmm3, xmm3, 1
  !addss xmm0, xmm3
  !addss xmm0, xmm2
  !rsqrtss xmm0, xmm0
  !shufps xmm0, xmm0, 0
  !mulps xmm0, xmm1
  !movups [Eax], xmm0  
EndProcedure 
Dreglor
Enthusiast
Enthusiast
Posts: 759
Joined: Sat Aug 02, 2003 11:22 pm
Location: OR, USA

Post by Dreglor »

Well it seams you have been doing some reading ;)

i fixed my problem took me a while thought (reinstall)

theres a new project (was mentioned on page 1 or 2 on this post) about a hardware raytracer the prototype isn't very powerful but it's said to contain the power of a 10gig p4 im not sure how this is but i think with out windows bulk all the power is there for the api's

i have a question about the vector normilize sse optimaztion does it still require a fourth paramter in the structure?
~Dreglor
THCM
Enthusiast
Enthusiast
Posts: 276
Joined: Fri Apr 25, 2003 5:06 pm
Location: Gummersbach - Germany
Contact:

Post by THCM »

Nice work! Read this: http://www.flipcode.com/articles/articl ... ce01.shtml
Jacco Bikker made a veeerrryyy fast raytracer! Try to find the bunny demo for example!
The Human Code Machine / Masters' Design Group
User avatar
Hades
Enthusiast
Enthusiast
Posts: 188
Joined: Tue May 17, 2005 8:39 pm

Post by Hades »

Dregor wrote: theres a new project (was mentioned on page 1 or 2 on this post) about a hardware raytracer the prototype isn't very powerful but it's said to contain the power of a 10gig p4 im not sure how this is but i think with out windows bulk all the power is there for the api's
No, it's because that chip is able to do massive parallel FPU math, and is optimized for raytraycing.
It's like a GPU for rasterizing. Try to do rasterizing in software, and you know what I am trying to say.

Dreglor wrote: i have a question about the vector normilize sse optimaztion does it still require a fourth paramter in the structure?
Yes. It reads and writes 4x4 bytes so if your structure is only 3x4 bytes it will overwrite something random. That's nothing you will want.
I could change that, but you will loose some speed.

SSE isn't perfectly suited for dot product and vector normalization. Especially if you have to call a procedure and calculate only 1 vector.
Inlined FPU ASM should be a lot faster (at least on Athlons).

But inlined ASM is very hard to maintain in PB, if you have to change something. :(
THCM wrote:Nice work! Read this: http://www.flipcode.com/articles/articl ... ce01.shtml
Jacco Bikker made a veeerrryyy fast raytracer! Try to find the bunny demo for example!
I think Dreglor uses that article as reference for his project. :D
But I woudn't say "veeerrryyy fast". It's really a nice raytracer, and a great tutorial, but there is a lot of room for improvement.
But I am pretty sure that's intentionally, because if it was optimized to the limit no one could read the code. :wink:


Edit: @Dreglor
Ok, I've done a vector normalization procedure with FPU ASM. You can use it with your 3x4 vector structure. But even on my Athlon it's a tiny bit slower than the SSE version.
It would be great, if you could test the speed difference between them on your pentium. (Maybe in a test environment)

Code: Select all

Procedure VectorNormalizeFPU(*this.xyz)
  !MOV Eax, dword[Esp]
  !FLD dword[Eax+8]
  !FST st3
  !FMUL st,st
  !FLD dword[Eax+4]
  !FST st3
  !FMUL st,st
  !FLD dword[Eax]
  !FST st3
  !FMUL st,st
  !FLD1
  !FXCH st3
  !FADDP st2,st
  !FADDP st1,st
  !FSQRT
  !FDIVP st1,st
  !FXCH st1
  !FMUL st,st1
  !FSTP dword[Eax]
  !FXCH st1
  !FMUL st,st1
  !FSTP dword[Eax+4]
  !FXCH st1
  !FMUL st,st1
  !FSTP dword[Eax+8]
EndProcedure
It is just a fast hack. I didn't think too much about pairing and such, so maybe there is room for improvement.
Dreglor
Enthusiast
Enthusiast
Posts: 759
Joined: Sat Aug 02, 2003 11:22 pm
Location: OR, USA

Post by Dreglor »

@THCM
yeah ive been reading that its one of the best articals out there for raytracing
it gives me alot more understanding that random pdfs on it ;)

@Hades
yeah whats intresting is that sse version works well with out the fourth parameter it seams to work for the shadow tests (were vector normilized is used) but it not good for the camera vector calulation procedure
it messes up the image a bit (a slight random like you said)

I agree that SSE isn't perfect for vector math is more for matrix math than anything if if floating point is about the same and it works with the xyz procedure unchaged then im all for it

i finnally found an artical that describes adaptive subsampling in more than just a single sentance
originally for radiosity (which might be a intresting outlook for this project) im adpating it for this
its not as fast as i like it becasue i have to return to arrays before i plot it it all at the end
~Dreglor
User avatar
Hades
Enthusiast
Enthusiast
Posts: 188
Joined: Tue May 17, 2005 8:39 pm

Post by Hades »

Ok, I have done a bit thinking.
If you want to render a nature scene with trees, flowers and such, a triangle only ray tracer should be faster than a ray tracer with multiple primitives.
But if you want to render buildings there is a much faster option. Look around in your house. The walls, the floor, the ceiling and even a lot of the furniture is made of rectangles. Furthermore most of these rectangles are plane aligned (lying in the xy, xz or yz plane). Special casing this will give a huge speed boost, because the intersection code is a lot faster then for a generic triangle, and you need less primitives.
For an industrial plant it is even more dramatic. Building that with triangles only would lead to an gigantic amount of triangles. But if would special case all the walls, tubes, wires... I bet the memory footprint would be less than 10% of the triangle only version. So you would have much less cach misses and the kd-tree would be much more shallow, so travelling it will be faster to.

And there is always the possibility to have both. A tiangle only part in your engine, for rendering organic stuff, and a part that is able handle all that special cases of technical objects. Then you could jump to the appropriate part based on the object type.

By the way, you should integrate object handling. Objects (in my definition) are groupes of primitives that stick and maybe move together.
So a building would be one object with its own kd-tree and a person or a movable chair inside that building would be another object with its own kd-tree, enclosed by a bounding box, or bounding sphere.

Oh, and are you interested in This triangle intersection code? It is should be faster for small numbers (thousands) of triangles, but slower for huge numbers (millions) of triangles.
I have even seen some ways to improve its speed even more.
Dreglor
Enthusiast
Enthusiast
Posts: 759
Joined: Sat Aug 02, 2003 11:22 pm
Location: OR, USA

Post by Dreglor »

sense the idea is still open
i think the cases you bring up are a good idea to leave the muilt-type in the code and not go for a trianlge only system
i also think with a bit of recoding it wouldn't be much faster then removing everything but the triangles. ive already got rid of a huge bottel neck with the selecting of the intersection code

and in reality i don't think i seen one triangle only raytracer which worries me
maybe its because it slower becasue of the surfaces having to be repersented as serveral triangles insted of a the single primitve

also im wondering if it would be faster if the vectorMath was using the FPU equivalent and wasn't inline (in there own procedures)

i guess it is a question of readablity vs speed

we could do all of the vectormath using FPU extentions and inline them but it be pretty nasty looking and bugs would be nasty to find
but if it worked it sure it would fly

then again inside of procedures bug fixing would be easier and it could be read easier but the trade off is speed
~Dreglor
User avatar
Hades
Enthusiast
Enthusiast
Posts: 188
Joined: Tue May 17, 2005 8:39 pm

Post by Hades »

Ok, if you agree on multi-type I can start coding. :D

With vector math... if it's something like dot product, then there's not much I can do.
ASM only helps for complex multiline PB code. :(
But I will check other ways to improve the speed.

By the way, are you thinking about traycing 4 rays together with SSE?
Would be a pain in the a** to implement, but someone reported a speed boost factor of up to 3.7!
Dreglor
Enthusiast
Enthusiast
Posts: 759
Joined: Sat Aug 02, 2003 11:22 pm
Location: OR, USA

Post by Dreglor »

Ah, well

anyways the sse 4x4 raytracing I heard about but i have no clue how it would work or how it would be done
~Dreglor
User avatar
Hades
Enthusiast
Enthusiast
Posts: 188
Joined: Tue May 17, 2005 8:39 pm

Post by Hades »

Hmm...

I'm not sure I could do it. And it would require a lot of changes in the whole code.
So let's forget about that for now. Maybe in PBRay 2.0 :D

You didn't answer my question about that other triangle intersection test, so I do what I want. :wink:
I've started to implement it, but slightly different. It will be significantly faster than his version, especially inside a kd-tree, where every intersection test results in a hit or close miss.
Dreglor
Enthusiast
Enthusiast
Posts: 759
Joined: Sat Aug 02, 2003 11:22 pm
Location: OR, USA

Post by Dreglor »

oh sorry, i didn't see that part

if you think it will be faster than the triangle intersection code we got now that all power to you
~Dreglor
User avatar
Hades
Enthusiast
Enthusiast
Posts: 188
Joined: Tue May 17, 2005 8:39 pm

Post by Hades »

Ok, I have done a first test version. In my test setup I have added 20 non textured triangles to your scene.
Old time: 2433ms
New time: 2000ms

But I had to add 28 bytes to your triangle structure for precomputed values.
Should I follow that road, and convert it to ASM, or better go for that old version with less memory requirements?
Dreglor
Enthusiast
Enthusiast
Posts: 759
Joined: Sat Aug 02, 2003 11:22 pm
Location: OR, USA

Post by Dreglor »

not bad, i think you should follow it and optimize it :wink:
personally im not too consired about the memory footprint at this moment
i think 28bytes per triangle is an acceptable trade off for 400 ms of speed
~Dreglor
Post Reply