Speed access for DIM and AllocatedMemory - Crazy Results

Ralf · Post by **Ralf** » Mon Mar 28, 2005 5:33 pm

I want to read and write some datas (each loop three bytes) by its fastest way! I have do some speedtests to see if DIM or AllocateMemory would be the fastest way for manipulating stuff! (even i thought DIM is the nearly the same as AllocateMemory but with another handling for the user )

In my speed example i used a DIM with structure and three bytes values inside! The DIM and AllocatedMemory size are in both examples the same!

So i will use PeekB and PokeB for the allocated memory stuff and getval=table(x,y) or table(x,y) = setval !

I am very wondering about the results! Following three lines are running inside the "write to mem and write to DIM" loop:

Code: Select all

   Calculations A:

   PokeB(*bitmapdatapos    , 255 * (i / (width - 1.0)))
   PokeB(*bitmapdatapos+1, 2) 
   PokeB(*bitmapdatapos+2, 255 * (j / (height - 1.0)))

Code: Select all

4781 - write to mem
  641 - read from mem
---------------------------
5422 - total time


3328 - write to DIM
2500 - read from DIM
---------------------------
5828 - total time

All in one, in this example (Calculations A) direct memory access seems to be faster in its total time as DIM access! But, please compare the WriteMem and WriteDIM timing! Writing to a DIM is faster as direct memor access! But on the other side, ReadMem is x times faster as ReadDIM !? That cant be really correct, or? I am very wondering about the timing results... (have tested this x times!)

Now be wondered, when using nearly same example as topic, but changing one of the three lines of CalculationsA (i will add some more math ops, which should slow it i.e. a bit (only a little bit) down, because of some additional maths!?)

Code: Select all

   Calculations B:

   PokeB(*bitmapdatapos    , 255 * (i / (width - 1.0)))
   PokeB(*bitmapdatapos + 1, Cos(Sin(i - width / 2)) + Sin(j - height / 2))
   PokeB(*bitmapdatapos+2, 255 * (j / (height - 1.0)))

Code: Select all

14844 - write to mem
  625 - read from mem
---------------------------
15469 - total time


13765 - write to DIM
 2625 - read from DIM
---------------------------
16390 - total time

How is it possible, that adding only 3 more math ops (SIN/COS) will slow the loop so dramastic down? Is there any way for faster PookB, PeekB, Sin and Cos ?

Here is at least just an example to write datas without calculations and tested it speed!

Code: Select all

   Calculations C:

   PokeB(*bitmapdatapos    , 2)
   PokeB(*bitmapdatapos+1, 2)
   PokeB(*bitmapdatapos+2, 2)

Code: Select all

2297 - write to mem
 593 - read from mem
---------------------------
2890 - total time


2422 - write to DIM
2485 - read from DIM
---------------------------
3907 - total time

As you may see, in the last example (no math ops), direct memory read/write is always faster as DIM accessing. (why isnt it so in the CalculationA and B example?)

Also why are example A and B in writing to DIM faster as writing to memory

I am searching for a faster write access to memory for things like CalculationA and B example? (Is there a way to use PeekB and PokeB alternative in direct ASM that could be faster?

Blade · Post by **Blade** » Mon Mar 28, 2005 6:54 pm

Perhaps writing a longword is faster than three bytes? (pay attention to the overlapping bytes, in consecutive ops should not be a problem...)

Ralf · Post by **Ralf** » Mon Mar 28, 2005 7:00 pm

Blade wrote:Perhaps writing a longword is faster than three bytes? (pay attention to the overlapping bytes, in consecutive ops should not be a problem...)

yes i know longs are general faster! but i have to write ever three bytes, following by the next three bytes! (else i would use longs!). I have tried to use 1 word and 1 byte... its slower as writing 3 bytes! I have tested a lot but when using some maths (as the second line in the three line calculation - this will slow it very down)

GedB · Post by **GedB** » Sat Apr 02, 2005 3:43 am

Ralf,

Use the /commented option on the command line compiler to look at the ASM generated.

http://www.purebasic.com/documentation/ ... piler.html

What I am sure you will find is that the DIM version produces some optimised code that deals with the 4byte boundaries for greater speed.

Once you have a grip on what the compiler is doing, you can always drop down to inline ASM and optimise further. Obviously, don't use ASM unless you really can improve on the compiler.

I'd give a more complete reply, but I'm on my honeymoon in Thailand right now abd away from my compiler.

Post by **Fred** » Sat Apr 02, 2005 1:11 pm

Basically added 3 ops line Cos()/Sin() can slowdown the code a lot as these ops really takes time. May be you should use a Cos()/Sin() lookup table which precalculated values, so it will be faster to read.

Ralf · Post by **Ralf** » Sun Apr 03, 2005 4:01 pm

Fred wrote:Basically added 3 ops line Cos()/Sin() can slowdown the code a lot as these ops really takes time. May be you should use a Cos()/Sin() lookup table which precalculated values, so it will be faster to read.

Thanks Fred! I will think about the table idea and how i can implent it in my code. btw, why is readmem a lot faster as reading same data from DIM - and write to mem sometimes slower as writing datas to DIM?

Is there any way to improve the write2mem (poke) using inline asm, or copying its routine to the cache?

Is there any way to copy and execute a special routine (if only a few kb in size) direct into the cache like C2P on the amiga and how to implent it? thanks

Post by **Fred** » Sun Apr 03, 2005 5:17 pm

The PokeB can be easily replaced with a pointer, like this (it's the fastest way to deal with memory in PB):

Code: Select all

  *Buffer.Byte = @"Test"
  While *Buffer\b <> 0
    Debug *Buffer\b
    *Buffer+1
  Wend

You can use the pointer to read or write, seemlessly. You could change it to inline ASM but it won't probably give any speed boost as the PB code is good enough.

About the cache, on an x86 you have way more cache than on 680x0 used by the Amiga, which means than such tricks which was relevant for the Amiga are not really doable on PC.

Ralf · Post by **Ralf** » Sun Apr 03, 2005 8:30 pm

Fred wrote:The PokeB can be easily replaced with a pointer, like this (it's the fastest way to deal with memory in PB):
Code: Select all
  *Buffer.Byte = @"Test"
  While *Buffer\b <> 0
    Debug *Buffer\b
    *Buffer+1
  Wend
You can use the pointer to read or write, seemlessly. You could change it to inline ASM but it won't probably give any speed boost as the PB code is good enough.

About the cache, on an x86 you have way more cache than on 680x0 used by the Amiga, which means than such tricks which was relevant for the Amiga are not really doable on PC.

Thanks Fred! I will take a look to the pointer methode. About the different size of cache on pc and amiga, i know

it was just only an idea to copy and execute time critical routines direct into the cache! just only an idea

Post by **Polo** » Tue Apr 05, 2005 5:33 pm

BTW, how could i use pointer for this:

*mem=AllocateMemory(12)

*mem.LONG=12
*mem+4
*mem.FLOAT=1.26
*mem+4
*mem.BYTE=5
*mem+1
[...]
?
It doesn't work...

El_Choni · Post by **El_Choni** » Tue Apr 05, 2005 5:48 pm

Code: Select all

Structure MyMonsterVariable
 StructureUnion
   f.f
   l.l
   b.b
 EndStructureUnion
EndStructure

*mem.MyMonsterVariable = AllocateMemory(12)

*mem\l = 12
*mem+4
*mem\f = 1.26
*mem+4
*mem\b = 5
*mem+1

Post by **Polo** » Tue Apr 05, 2005 7:12 pm

I need to use a structure ?
Well, I think i'll stick with Peek/Poke

El_Choni · Post by **El_Choni** » Tue Apr 05, 2005 7:24 pm

Yes, structures are very dangerous and complex. Stay away from them!

Post by **Polo** » Tue Apr 05, 2005 7:30 pm

I'm going to try to use them anyway, but i'll require me a lot of work

blueznl · Post by **blueznl** » Tue Apr 05, 2005 8:52 pm

fred, are you sure *pointer\b is faster than pokeb()? i remember we were all collectively messing around with 'propercase' (as below) and it turned out that pokeb() was faster at that time (puzzled look)... you really want me to benchmark that, do you?

Code: Select all

Procedure.s x_propercase(s.s) 
  Protected *p.l, f.l, b.l
  ;
  ; *** make all lowercase except for the first chars of each word
  ;
  *p = @s 
  f = 1 
  b = PeekB(*p) 
  While b <> 0 
    If b = 32 
      f = 1 
    ElseIf f = 1 And b >= 97 And b<=122 
      PokeB(*p,b & $DF) 
      f = 0 
    ElseIf f = 0 And b >= 65 And b <= 90 
      PokeB(*p,b | $20) 
    Else 
      f = 0 
    EndIf 
    *p = *p+1
    b = PeekB(*p) 
  Wend 
  ProcedureReturn s 
EndProcedure

blueznl · Post by **blueznl** » Tue Apr 05, 2005 8:54 pm

now, as for speeding up, gfabasic (no, not that one again!

) did have a nice feature / function called SINQ() and COSQ()

they were, basically, sin and cos functions that used a lookup table (from ints (degrees) to floats)

hmmm... can we get those in pure? <innocent look>