Speed access for DIM and AllocatedMemory - Crazy Results

Everything else that doesn't fall into one of the other PB categories.
Ralf
Enthusiast
Enthusiast
Posts: 203
Joined: Fri May 30, 2003 1:29 pm
Location: Germany

Speed access for DIM and AllocatedMemory - Crazy Results

Post by Ralf »

I want to read and write some datas (each loop three bytes) by its fastest way! I have do some speedtests to see if DIM or AllocateMemory would be the fastest way for manipulating stuff! (even i thought DIM is the nearly the same as AllocateMemory but with another handling for the user )

In my speed example i used a DIM with structure and three bytes values inside! The DIM and AllocatedMemory size are in both examples the same!

So i will use PeekB and PokeB for the allocated memory stuff and getval=table(x,y) or table(x,y) = setval !

I am very wondering about the results! Following three lines are running inside the "write to mem and write to DIM" loop:

Code: Select all

   Calculations A:

   PokeB(*bitmapdatapos    , 255 * (i / (width - 1.0)))
   PokeB(*bitmapdatapos+1, 2) 
   PokeB(*bitmapdatapos+2, 255 * (j / (height - 1.0))) 

Code: Select all

4781 - write to mem
  641 - read from mem
---------------------------
5422 - total time


3328 - write to DIM
2500 - read from DIM
---------------------------
5828 - total time

All in one, in this example (Calculations A) direct memory access seems to be faster in its total time as DIM access! But, please compare the WriteMem and WriteDIM timing! Writing to a DIM is faster as direct memor access! But on the other side, ReadMem is x times faster as ReadDIM !? That cant be really correct, or? I am very wondering about the timing results... (have tested this x times!)

Now be wondered, when using nearly same example as topic, but changing one of the three lines of CalculationsA (i will add some more math ops, which should slow it i.e. a bit (only a little bit) down, because of some additional maths!?)

Code: Select all

   Calculations B:

   PokeB(*bitmapdatapos    , 255 * (i / (width - 1.0)))
   PokeB(*bitmapdatapos + 1, Cos(Sin(i - width / 2)) + Sin(j - height / 2))
   PokeB(*bitmapdatapos+2, 255 * (j / (height - 1.0))) 

Code: Select all

14844 - write to mem
  625 - read from mem
---------------------------
15469 - total time


13765 - write to DIM
 2625 - read from DIM
---------------------------
16390 - total time
How is it possible, that adding only 3 more math ops (SIN/COS) will slow the loop so dramastic down? Is there any way for faster PookB, PeekB, Sin and Cos ?

Here is at least just an example to write datas without calculations and tested it speed!

Code: Select all

   Calculations C:

   PokeB(*bitmapdatapos    , 2)
   PokeB(*bitmapdatapos+1, 2)
   PokeB(*bitmapdatapos+2, 2) 

Code: Select all

2297 - write to mem
 593 - read from mem
---------------------------
2890 - total time


2422 - write to DIM
2485 - read from DIM
---------------------------
3907 - total time
As you may see, in the last example (no math ops), direct memory read/write is always faster as DIM accessing. (why isnt it so in the CalculationA and B example?)

Also why are example A and B in writing to DIM faster as writing to memory

I am searching for a faster write access to memory for things like CalculationA and B example? (Is there a way to use PeekB and PokeB alternative in direct ASM that could be faster? :?:
Blade
Enthusiast
Enthusiast
Posts: 362
Joined: Wed Aug 06, 2003 2:49 pm
Location: Venice - Italy, Japan when possible.
Contact:

Post by Blade »

Perhaps writing a longword is faster than three bytes? (pay attention to the overlapping bytes, in consecutive ops should not be a problem...)
Ralf
Enthusiast
Enthusiast
Posts: 203
Joined: Fri May 30, 2003 1:29 pm
Location: Germany

Post by Ralf »

Blade wrote:Perhaps writing a longword is faster than three bytes? (pay attention to the overlapping bytes, in consecutive ops should not be a problem...)
yes i know longs are general faster! but i have to write ever three bytes, following by the next three bytes! (else i would use longs!). I have tried to use 1 word and 1 byte... its slower as writing 3 bytes! I have tested a lot but when using some maths (as the second line in the three line calculation - this will slow it very down)
User avatar
GedB
Addict
Addict
Posts: 1313
Joined: Fri May 16, 2003 3:47 pm
Location: England
Contact:

Post by GedB »

Ralf,

Use the /commented option on the command line compiler to look at the ASM generated.

http://www.purebasic.com/documentation/ ... piler.html

What I am sure you will find is that the DIM version produces some optimised code that deals with the 4byte boundaries for greater speed.

Once you have a grip on what the compiler is doing, you can always drop down to inline ASM and optimise further. Obviously, don't use ASM unless you really can improve on the compiler.

I'd give a more complete reply, but I'm on my honeymoon in Thailand right now abd away from my compiler.
Fred
Administrator
Administrator
Posts: 18351
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Post by Fred »

Basically added 3 ops line Cos()/Sin() can slowdown the code a lot as these ops really takes time. May be you should use a Cos()/Sin() lookup table which precalculated values, so it will be faster to read.
Ralf
Enthusiast
Enthusiast
Posts: 203
Joined: Fri May 30, 2003 1:29 pm
Location: Germany

Post by Ralf »

Fred wrote:Basically added 3 ops line Cos()/Sin() can slowdown the code a lot as these ops really takes time. May be you should use a Cos()/Sin() lookup table which precalculated values, so it will be faster to read.
Thanks Fred! I will think about the table idea and how i can implent it in my code. btw, why is readmem a lot faster as reading same data from DIM - and write to mem sometimes slower as writing datas to DIM?

Is there any way to improve the write2mem (poke) using inline asm, or copying its routine to the cache? ;)

Is there any way to copy and execute a special routine (if only a few kb in size) direct into the cache like C2P on the amiga and how to implent it? thanks
Fred
Administrator
Administrator
Posts: 18351
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Post by Fred »

The PokeB can be easily replaced with a pointer, like this (it's the fastest way to deal with memory in PB):

Code: Select all

  *Buffer.Byte = @"Test"
  While *Buffer\b <> 0
    Debug *Buffer\b
    *Buffer+1
  Wend
You can use the pointer to read or write, seemlessly. You could change it to inline ASM but it won't probably give any speed boost as the PB code is good enough.

About the cache, on an x86 you have way more cache than on 680x0 used by the Amiga, which means than such tricks which was relevant for the Amiga are not really doable on PC.
Ralf
Enthusiast
Enthusiast
Posts: 203
Joined: Fri May 30, 2003 1:29 pm
Location: Germany

Post by Ralf »

Fred wrote:The PokeB can be easily replaced with a pointer, like this (it's the fastest way to deal with memory in PB):

Code: Select all

  *Buffer.Byte = @"Test"
  While *Buffer\b <> 0
    Debug *Buffer\b
    *Buffer+1
  Wend
You can use the pointer to read or write, seemlessly. You could change it to inline ASM but it won't probably give any speed boost as the PB code is good enough.

About the cache, on an x86 you have way more cache than on 680x0 used by the Amiga, which means than such tricks which was relevant for the Amiga are not really doable on PC.
Thanks Fred! I will take a look to the pointer methode. About the different size of cache on pc and amiga, i know ;) it was just only an idea to copy and execute time critical routines direct into the cache! just only an idea ;)
Polo
Addict
Addict
Posts: 2422
Joined: Tue May 06, 2003 5:07 pm
Location: UK

Post by Polo »

BTW, how could i use pointer for this:

*mem=AllocateMemory(12)

*mem.LONG=12
*mem+4
*mem.FLOAT=1.26
*mem+4
*mem.BYTE=5
*mem+1
[...]
?
It doesn't work...
El_Choni
TailBite Expert
TailBite Expert
Posts: 1007
Joined: Fri Apr 25, 2003 6:09 pm
Location: Spain

Post by El_Choni »

Code: Select all

Structure MyMonsterVariable
 StructureUnion
   f.f
   l.l
   b.b
 EndStructureUnion
EndStructure

*mem.MyMonsterVariable = AllocateMemory(12)

*mem\l = 12
*mem+4
*mem\f = 1.26
*mem+4
*mem\b = 5
*mem+1 
El_Choni
Polo
Addict
Addict
Posts: 2422
Joined: Tue May 06, 2003 5:07 pm
Location: UK

Post by Polo »

I need to use a structure ?
Well, I think i'll stick with Peek/Poke :)
El_Choni
TailBite Expert
TailBite Expert
Posts: 1007
Joined: Fri Apr 25, 2003 6:09 pm
Location: Spain

Post by El_Choni »

Yes, structures are very dangerous and complex. Stay away from them! :twisted:
El_Choni
Polo
Addict
Addict
Posts: 2422
Joined: Tue May 06, 2003 5:07 pm
Location: UK

Post by Polo »

I'm going to try to use them anyway, but i'll require me a lot of work :)
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6172
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

fred, are you sure *pointer\b is faster than pokeb()? i remember we were all collectively messing around with 'propercase' (as below) and it turned out that pokeb() was faster at that time (puzzled look)... you really want me to benchmark that, do you? :-)

Code: Select all

Procedure.s x_propercase(s.s) 
  Protected *p.l, f.l, b.l
  ;
  ; *** make all lowercase except for the first chars of each word
  ;
  *p = @s 
  f = 1 
  b = PeekB(*p) 
  While b <> 0 
    If b = 32 
      f = 1 
    ElseIf f = 1 And b >= 97 And b<=122 
      PokeB(*p,b & $DF) 
      f = 0 
    ElseIf f = 0 And b >= 65 And b <= 90 
      PokeB(*p,b | $20) 
    Else 
      f = 0 
    EndIf 
    *p = *p+1
    b = PeekB(*p) 
  Wend 
  ProcedureReturn s 
EndProcedure 
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB - upgrade incoming...)
( The path to enlightenment and the PureBasic Survival Guide right here... )
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6172
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

now, as for speeding up, gfabasic (no, not that one again! :-)) did have a nice feature / function called SINQ() and COSQ()

they were, basically, sin and cos functions that used a lookup table (from ints (degrees) to floats)

hmmm... can we get those in pure? <innocent look> :-)
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB - upgrade incoming...)
( The path to enlightenment and the PureBasic Survival Guide right here... )
Post Reply