Circular buffers

kinglestat · Post by **kinglestat** » Thu Sep 18, 2008 11:59 am

I've been battling my conscience to make this code better

;- arrCommon
Macro          arrCommon( uvar, max, arrname, gvar )

   If Not arrname( gvar )\bActive
      uvar = gvar
      arrname( gvar )\bActive + 1
   Else
      Repeat
         gvar + 1
         If gvar >= max
            gvar = 1
            WriteLog( ">Waiting next message" )
            Delay( 64 )
         EndIf

         If Not arrname( gvar )\bActive
            uvar = gvar
            arrname( gvar )\bActive + 1
            Break
         EndIf
      ForEver
   EndIf

EndMacro

Macro          arrNextQ( var )
   arrCommon( var, gdefMaxQueues, gnetUDPQ, gnetNextQ )
EndMacro

But let me start at the beginning. I am using several of these circular arrays (all relatively small) between 32 and 512. What bugs me is that to find an element I have to cycle sequentally through all the elements...which jars my semi-methodical mind. Still for the life of me I cant figure out a more efficient manner....incidentally linked lists I cant use...too much hassles with threads

Michael Vogel · Post by **Michael Vogel** » Thu Sep 18, 2008 2:28 pm

A small example how to use would help, but maybe a simple module or and could help

if you define a fixed buffer, like...
#buffersize=64
#bufferand=#buffersize-1

you can access an element without checking the borders...

repeat
debug buffer(field%#buffersize); ...or... debug buffer(field%&#bufferand)
field+1
forever

kinglestat · Post by **kinglestat** » Thu Sep 18, 2008 3:16 pm

I am using like this

Code: Select all

#NET_MAXQUEUES         = 384

Structure stQ
   *Buffer.l
   Sequence.l
   Socket.l
   bActive.w
EndStructure

Global               gnetNextQ.l             = 1
Global Dim        gnetUDPQ.stQ( #TNET_MAXQUEUES  )

arrNextQ( k )
gnetUDPQ\Buffer = AllocateMemory()

Trond · Post by **Trond** » Thu Sep 18, 2008 5:39 pm

kinglestat wrote:I am using like this

What do you mean by "finding" an element?

kinglestat · Post by **kinglestat** » Thu Sep 18, 2008 9:58 pm

finding...for example in the case of the example I posted "Socket" is my comparand. So I loop through all the elements until I find the matching socket

Froggerprogger · Post by **Froggerprogger** » Fri Sep 19, 2008 8:08 am

So your question has nearly nothing to do with the circularity of your arrays?
Just two things for that:
1. Accessing via "index % arraySize" is faster than manually checking the boundaries and remap the index by divisions
2. For arraySize a power of 2 and arraySizeM1 := arraySize - 1 it is even much faster than 1. to use "index & arraySizeM1"

But it seems that you just look for an alternative to sequential search for finding an element in the array. OK: If the array is ordered you can use "Binary Search" which is much faster, otherwise there is no chance to speed it up, with lists neither. The datastructure you are intuitively searching for might be a hashtable (or alternatively any search-tree). But for only 32 elements a simple array with sequential search will be much faster than a hashtable. With 500 elements there could be a little speedup with hashtables. Somewhere in the forums there are some implementations of hashtables from users, but I don't know the fastet/best of them.

kinglestat · Post by **kinglestat** » Fri Sep 19, 2008 1:11 pm

well, I do have my own hash arrays...posted in the forum, but for small amounts CPU+Memory it is very expensive. Same goes for btrees considering with macros I dont even have the function overhead; I do like the "index % arraySize" approach, though I have an issue with this..ie it will only work when I only need to compare only 1 element.

To clarify my question is 2 fold

1) Is using a circular array the proper approach for such coding?
2a) If it is, how can I improve my circular tables?
2b) If it is not, what i sthe forum's suggestion?

Froggerprogger · Post by **Froggerprogger** » Fri Sep 19, 2008 6:59 pm

I had the time today to code a very slim and fast hasharray. I understood the following scenario:
a) There is a couple of servers, from which maximal #N at a time are active
b) When they are active, some data for them has to be found very fast for some modifcations
c) An active server might become unactivated, and then its slot in the underlying datastructure is freed for a new server

If all active servers are only accessed by their unique id, then a hashtable would be asymptotically a good solution, but it is usually too slow for small problemsizes.
An alternative would be to use a linked list of active servers. Another would be to use a simple linear array. This idea could be enhanced to an array with more or less intelligent indexing, such as the circular approach.
This idea could be further enhanced to a very slim hashtable by direct mapping with dynamic addressing, that means:
A server is mapped by its ID to the array-position CURPOS = ID % ArraySize.
If this slot is in use by another server, then try another slot, e.g. by simply incrementing CURPOS or by adding a high prime to the current position: CURPOS = (CURPOS + HIGHPRIME) % ArraySize. Repeat this until a free slot (or the slot we searched for by its ID) is found. Here even more complex (e.g. quadratic, or on the ID itself depending) algorithms can be used to find a free slot with high probability

I implemented this approach and tested it in some scenarios and it seems (as there suspected) to be always faster for random accesses than using the circular array. Of course some further tests should be made, especially for very often deleted and new added server, etc., but I do not suspect there any bad surprise. So this approach is more like the intuitive one of a hashtable, and seems to be very fast. But I don't know if it fits your needs.

Here's the code. CA_* for CycleArray-functions, HA_* for HashArray

Code: Select all

EnableExplicit

#N = 256 ; the maximal number of elements, must be a power of 2
#Nmask = #N-1

Structure Element
  id.l ; e.g. the socket-id
  active.l ; if 0, then this element might be overwritten
  elemData.l ; some arbitrary data
EndStructure

Define *tempElement.Element
Define i.l

;
; CycleArray
;

Dim CycleArray.Element(#N) ; elements at indices 0..#N-1 are used
Define ca_curAddIndex.l = 0
Define ca_curFindIndex.l = 0

Macro IncreaseCircular(var)
  var = (var + 1) & #Nmask
EndMacro

Macro DecreaseCircular(var)
  var = (var - 1) & #Nmask
EndMacro

Macro CA_AddElement(m_id, m_elemData)
  ; ASSUMPTION: there will never be #N active elements at a time, otherwise danger of endless loop
  While CycleArray(ca_curAddIndex)\active
    IncreaseCircular(ca_curAddIndex)
  Wend
  CycleArray(ca_curAddIndex)\id = m_id
  CycleArray(ca_curAddIndex)\active = 1
  CycleArray(ca_curAddIndex)\elemData = m_elemData
EndMacro

Macro CA_FindElement(m_id, m_p_elem)
  ; ASSUMPTION: the m_id to find IS somewhere in the array, otherwise danger of endless loop
  ca_curFindIndex  = ca_curAddIndex
  While CycleArray(ca_curFindIndex)\id <> m_id
    ; lets search backwards, because higher probability to find in most recent entries
    DecreaseCircular(ca_curFindIndex)
  Wend
  ; let _p_elem point to the found element
  m_p_elem = @CycleArray(ca_curFindIndex)
EndMacro

Macro CA_RemoveElement(m_id)
  ; ASSUMPTION: the m_id to remove IS somewhere in the array, otherwise danger of endless loop
  CA_FindElement(m_id, *tempElement)
  *tempElement\active = 0
EndMacro

Macro CA_DebugElements()
  For i=0 To #N-1
    Debug "--------------------"
    Debug i
    Debug CycleArray(i)\id
    Debug CycleArray(i)\active
    Debug CycleArray(i)\elemData
  Next
EndMacro

;
; HashArray
;

Dim HashArray.Element(#N) ; elements at indices 0..#N-1 are used
Define ha_tempIndex.l

Macro StartHashIndex(var, key)
  var = (key) & #Nmask
EndMacro

Macro IncreaseHashIndex(var)
  var = (var + 123454339) & #Nmask ; 123454339 might be any big prime
EndMacro

Macro HA_AddElement(m_id, m_elemData)
  ; ASSUMPTION: there will never be #N active elements at a time, otherwise danger of endless loop
  StartHashIndex(ha_tempIndex, m_id)
  While HashArray(ha_tempIndex)\active
    IncreaseHashIndex(ha_tempIndex)
  Wend
  HashArray(ha_tempIndex)\id = m_id
  HashArray(ha_tempIndex)\active = 1
  HashArray(ha_tempIndex)\elemData = m_elemData
EndMacro

Macro HA_FindElement(m_id, m_p_elem)
  ; ASSUMPTION: the m_id to find IS somewhere in the array, otherwise danger of endless loop
  StartHashIndex(ha_tempIndex, m_id)
  While HashArray(ha_tempIndex)\id <> m_id
    IncreaseHashIndex(ha_tempIndex)
  Wend
  ; let _p_elem point to the found element
  m_p_elem = @HashArray(ha_tempIndex)
EndMacro

Macro HA_RemoveElement(m_id)
  ; ASSUMPTION: the m_id to remove IS somewhere in the array, otherwise danger of endless loop
  HA_FindElement(m_id, *tempElement)
  *tempElement\active = 0
EndMacro

Macro HA_DebugElements()
  For i=0 To #N-1
    Debug "--------------------"
    Debug i
    Debug HashArray(i)\id
    Debug HashArray(i)\active
    Debug HashArray(i)\elemData
  Next
EndMacro



;########################################
; SPEEDING-TEST
;########################################

;
; testsuite: creates some actions and works through them once by CA_* and once by HA_*
;
#RandomSeed = 123456789
RandomSeed(#RandomSeed)
  
Structure Action
  type.l ; 1=add, 2=modify data, 3=remove
  id.l
  newData.l ; only for type 1 and 2
EndStructure

Macro CA_HandleActions()
  Define ca_time.l = ElapsedMilliseconds()
  Define id.l
  Define elemData.l
  Define *elem.Element
  For i=0 To #ActionsArraySize-1
    Debug "CA_HandleActions: " + Str(Actions(i)\type) + " " +  Str(Actions(i)\id)
    Select Actions(i)\type
      Case 1 : id = Actions(i)\id
                elemData = Actions(i)\newData
                CA_AddElement(id, elemData)

      Case 2 : id = Actions(i)\id
                CA_FindElement(id, *elem)
                *elem\elemData = Actions(i)\newData

      Case 3 : id = Actions(i)\id
                CA_RemoveElement(id)

    EndSelect
  Next
  ca_time = ElapsedMilliseconds() - ca_time
EndMacro

Macro HA_HandleActions()
  Define ha_time.l = ElapsedMilliseconds()
  Define id.l
  Define elemData.l
  Define *elem.Element
  For i=0 To #ActionsArraySize-1
    Debug "HA_HandleActions: " + Str(Actions(i)\type) + " " +  Str(Actions(i)\id)
    Select Actions(i)\type
      Case 1 : id = Actions(i)\id
                elemData = Actions(i)\newData
                HA_AddElement(id, elemData)

      Case 2 : id = Actions(i)\id
                HA_FindElement(id, *elem)
                *elem\elemData = Actions(i)\newData

      Case 3 : id = Actions(i)\id
                HA_RemoveElement(id)

    EndSelect
  Next
  ha_time = ElapsedMilliseconds() - ha_time
EndMacro


;
; EXAMPLE: Access-speed to #N-2 servers
;
#NServer = (#N-2)
#NAccesses = 1000000
#ActionsArraySize = 2*#NServer+#NAccesses

Macro ServerID(m_i)
  m_i
  ;1234567*(m_i)+7654321 ; use this more pseudo-random instead of m_i shows no big difference in speedings
  ;m_i*#N ; this is the worst-case in theory. All servers are initially mapped to the same entry
  ;m_i*#N/2 ; still a very bad case
  ;100000+123*m_i  ; in this case the offset (here 123) should be no power of 2
EndMacro

;
; create the actions
;
Dim Actions.Action(#ActionsArraySize)

; add server
For i=0 To #NServer-1
  Actions(i)\type = 1
  Actions(i)\id = ServerID(i) ; a unique id
  Actions(i)\newData = Random(9999)
Next

; access data
For i=#NServer To #NServer + #NAccesses - 1
  Actions(i)\type = 2
  Actions(i)\id = ServerID(Random(#NServer-1))
  Actions(i)\newData = Random(9999)
Next

; remove server
Define count.l = 0
For i=#NServer+#NAccesses To 2*#NServer + #NAccesses - 1
  Actions(i)\type = 3
  Actions(i)\id = ServerID(count) ; the previously added ids
  count + 1
Next

;
; start test
;

DisableDebugger
CA_HandleActions()
EnableDebugger
;CA_DebugElements() ; show the internals of CycleArray

DisableDebugger
HA_HandleActions()
EnableDebugger
;HA_DebugElements() ; show the internals of HashArray

Debug "Timing for CA: " + Str(ca_time)
Debug "Timing for HA: " + Str(ha_time)

[edit]
I have removed PB 4.3's function ArraySize(...) and added some alternative server-id-algorithms
[/edit]

kinglestat · Post by **kinglestat** » Sat Sep 20, 2008 7:16 am

thanks froggerprogger
nice, readable code
I was toying with a single hash for ALL my lists...but my efforts where not succesfull, was trying to adapt my C Hash stuff to be used by PB...but you have forestalled me. I like your approach...verey readable code and much cleaner. Much appreciated

EDIT
Does not compile for me, says something about ArraySize()

Froggerprogger · Post by **Froggerprogger** » Sat Sep 20, 2008 9:04 am

Thanks for the feedback.
Keep in mind, that the speed correlates to a good distribution of the keys inside the HashArray. E.g. when using:

Code: Select all

Macro ServerID(m_i)
  m_i*#N
EndMacro

then this is the worst case, as all keys are initially mapped to the same arrayelement. To handle this, you would have to adapt StartHashIndex(var, key) and IncreaseHashIndex(var), perhaps enriched to IncreaseHashIndex(var, key), so that e.g. a different prime is selected dependend on the key.
With m_i*#N the hasharray even becomes a bit slower than the cycled approach, with m_i*#N/2 or m_i*#N/3, ... it's faster again, though there are still many collisions in mapping.

So you should have a look which system the server-ids do follow. If it is e.g. newServerId = oldServerId + offset then it would be good iff offset is small or no power of two. In that case you might change, that #N can be arbitrary (not only a power of two). Then you would have to replace each occur of <something> & #Nmask by <something> % #N, though the modulo-operator is much slower than &. Using any prime for #N would be a good choice then.

Does not compile for me, says something about ArraySize()

I used PB 4.3 Beta 1. Don't know when ArraySize() came to PB. You might define:

Code: Select all

#ActionsArraySize = 2*#NServer + #NAccesses

and replace ArraySize(ActionsArray()) by #ActionsArraySize at each occur.

(I updated the above code with that)

kinglestat · Post by **kinglestat** » Sat Sep 20, 2008 9:56 am

have to say, froggerprogger, I am shocked
First of all I hadnt even thought of using macros for hashes...and using a simple hashing algo for small values....makes definite sense. I can sacrifice more memory to make hash a bit larger...but probability for collissions will not be so big any way. Truly well done and thanks..exactly what I wished for and my conscience is appeased. I would suggest ths topic would go to the Tips & tricks..

Froggerprogger · Post by **Froggerprogger** » Sat Sep 20, 2008 12:18 pm

I was surprised, too. Some posts above I wrote:

But for only 32 elements a simple array with sequential search will be much faster than a hashtable. With 500 elements there could be a little speedup with hashtables

But this example shows, that the opposite is the case, as long as the hashfunction can be kept simple, which depends on the distribution of the IDs used as keys.
And already a simple incrementing ID-generator works very well, as well as a random generator and most others. Only some special (though not very exotic) ID-combinations will slow down the implementation and would need a more complex (and slower) hashfunction.

EDIT
I started a thread in the tips&tricks-forum here http://www.purebasic.fr/english/viewtopic.php?t=34284 with cross-reference