Function to extract data blocks from input data stream

Share your advanced PureBasic knowledge/code with the community.
User avatar
Psychophanta
Always Here
Always Here
Posts: 5153
Joined: Wed Jun 11, 2003 9:33 pm
Location: Anare
Contact:

Function to extract data blocks from input data stream

Post by Psychophanta »

Imagine you need to get packets from a continuous source of data stream, for example a serial port.
The received data is a continuous stream which is taken by this funtion in chunks, for example the chunk got from ReadSerialPortData().
These chunks are of any size, and its boundaries do not have to match with the boundaries of each expected packet.
Each expected packet is marked in the streaming by a known "key" which consists in a known sequence of bytes, which can be alfanumeric or just a common byte sequence, this is, the packets you have to obtain are just separated, one from the other, by this "key".
So, the information between each "key" is just one packet to be obtained. And, so then of course, each packet can be of any size.
I know, i have probably reinvented wheel again, and even, might be it already exists somewhere in this forum, so, in that case, here it another repetition of it :)
This function (called here ObtenerPaquetesdesdeRistra() ) does just that. It reads a string (chunk) of incoming data, from which it intends to obtain packets of bytes separated by a sequence key, called 'separator()'.
INPUTS:
'Array separator.b()' is the sequence of bytes identifying the string entry, the separation between packets.
'Array string.b()' is the input string or sequence of bytes, containing the packets separated by each separator.
OUTPUTS:
'Array block.b()' is the base pointer to the packet output.
'Array pos.q()' is the collection of positions of each packet referred to the base pointer.
The function must output, after each call to it, between 0 and several packets:
The way it emits those packets is by means of a single base pointer 'block.b()', where the first packet starts, and a list 'pos()' of positions from that pointer, pointing these to the position of each packet.
The function returns the number of complete separators found in an available read, this is, in a call to it.

Code: Select all

Macro AsignarCadenaAscii(variable,cadena,terminacion=|#PB_String_NoZero)
  ;Carga una cadena de caracteres ascii extendido en una variable
  variable#=Space(StringByteLength(cadena#,#PB_Ascii))
  PokeS(@variable#,cadena#,StringByteLength(cadena#,#PB_Ascii),#PB_Ascii#terminacion#)
EndMacro
Procedure.q ObtenerPaquetesdesdeRistra(Array bloque.b(1),Array pos.q(1),Array separador.b(1),Array ristra.b(1))
  ; Imagine you need to get packets from a continuous source of data stream, for example a serial port.
  ; The received data is a continuous stream which is taken by this funtion in chunks, for example the chunk got from ReadSerialPortData()
  ; These chunks are of any size, and its boundaries do not have to match with the boundaries of each expected packet.
  ; Each expected packet is marked in the streaming by a known "key" which consists in a known sequence of bytes, which can be alfanumeric or just a common byte sequence, this is, the packets you have to obtain are just separated, one from the other, by this "key".
  ; So, the information between each "key" is just one packet to be obtained. And, so then of course, each packet can be of any size.
  ; I know, i have probably reinvented wheel again, and even, might be it already exists somewhere in this forum, so, in that case, here it another repetition of it :)
  ; This function (called here ObtenerPaquetesdesdeRistra() ) does just that. It reads a chunk of incoming data, from which it intends to obtain packets of bytes separated by a sequence key, called 'separador()'.
  ; INPUTS:
  ;   'Array separador.b()' is the sequence of bytes identifying the string entry, the separation between packets.
  ;   'Array ristra.b()' is the input string or sequence of bytes, containing the packets separated by each separator. 
  ; OUTPUTS:
  ;   'Array bloque.b()' is the base pointer to the packet output.
  ;   'Array pos.q()' is the collection of positions of each packet referred to the base pointer.
  ; The function must output, after each call to it, between 0 and several packets:
  ; The way it emits those packets is by means of a single base pointer 'bloque.b()', where the first packet starts, and a list 'pos()' of positions from that pointer, pointing these to the position of each packet.
  ; The function returns the number of complete separators found in an available read, this is, in a call to it.
  ;
  ;Esta función va leyendo una ristra de datos entrantes, desde los cuales, se pretenden obtener paquetes de bytes separados por una secuencia llamada 'separador()'.
  ;Cada paquete puede tener cualquier tamaño.
  ; ENTRADAS:
  ;'Array separador.b()' es la secuencia de bytes que identifican a la entrada de la ristra, la separación entre paquetes.
  ;'Array ristra.b()' es la ristra o secuencia de bytes de entrada, que contiene los paquetes separados por cada separador. 
  ; SALIDAS:
  ;'Array bloque.b()' es el puntero base de la salida de paquetes.
  ;'Array pos.q()' es la colección de posiciones de cada paquete referidos al puntero base.
  ;La función debe emitir, tras cada llamada a ella entre 0 y varios paquetes:
  ;La forma en que emite esos paquetes es mediante un único puntero base 'bloque.b()', donde empieza el primer paquete, y una lista 'pos()' de posiciones a partir de ese puntero, señalando estas la posición de cada paquete.
  ;La función devuelve el número de separadores completos encontrados en una lectura disponible en la ristra de entrada.
  Static Dim almacen.b(0),la.q,ps.q; 'almacen' es donde ir almacenando datos recibidos, de cara a la siguiente llamada a esta función
  Protected lr.q=ArraySize(ristra()),psti.q,lectura.q,pr.q,pr0.q,f.a,lb.q,ls.q=ArraySize(separador())
  If lr; <- Si la ristra no es nula
    ReDim bloque(lb)
    ReDim pos(lectura)
    ;catalogar el número de separadores completos encontrado en la ristra:
    While pr<lr; <- mientras la posición de ristra 'pr' NO haya alcanzado el final de la ristra:
      While ristra(pr)=separador(ps); <- si va coincidiendo todo el valor de la ristra con el del separador
        f+1
        pr+1:ps+1; <- se incrementan ambos punteros para seguir comparando
        If ps=ls; <- verificar si se llegó al final de la comprobación de la secuencia completa del separador
          psti=pr-ls; <- posición del separador encontrado
          If psti<0; => existe separador truncado al inicio de la cadena (ristra), concretamente termina en 'ps'
            la+psti; <- eliminar del almacen los últimos '-psti' bytes
            ReDim bloque(lb+la)
            CopyMemory(almacen(),@bloque(lb),la):lb+la
            lectura+1:ReDim pos(lectura):pos(lectura)=lb
            la=0:ReDim almacen(la)
          Else
            psti-pr0
            If lb+la+psti; <- coincidencia completa leida en esta cadena (ristra)
              ReDim bloque(lb+la+psti)
              CopyMemory(almacen(),@bloque(lb),la):lb+la
              CopyMemory(@ristra(pr0),@bloque(lb),psti):lb+psti
              lectura+1:ReDim pos(lectura):pos(lectura)=lb
              la=0:ReDim almacen(la)
            EndIf
          EndIf
          pr0=pr:ps=0:f=0
        ElseIf pr=lr; => existe separador truncado al final de la cadena, concretamente en 'lr-ps'
          Break 2   ; <- notar que aquí el contador del separador 'ps' es una variable local, por lo que al salir del bucle no se inicializa y se queda para una siguiente llamada a la función
        EndIf
      Wend
      pr+1-f:f=0; <- la variable 'f' es para cuando se encuentra algún comienzo del separador, pero no el separador completo
      ps=0
    Wend
    pr-pr0
    ReDim almacen(la+pr)
    CopyMemory(@ristra(pr0),@almacen(la),pr)
    la+pr
  EndIf
  ProcedureReturn lectura
EndProcedure
; Check it using only alfanumeric extended ascii string:
AsignarCadenaAscii(b$,"empieza"):longsep=7
dim sep.b(longsep):copymemory(@b$,sep(),longsep)
dim paquete.b(0)
dim posiciones.q(0)
Macro availabledatastream(stream)
  AsignarCadenaAscii(b$,stream#)
  longdat=StringByteLength(b$,#PB_Ascii):dim datosentrada.b(longdat):copymemory(@b$,datosentrada(),longdat)
  debug ObtenerPaquetesdesdeRistra(paquete(),posiciones(),sep(),datosentrada())
  debug "------"
  for i=0 to arraysize(posiciones())-1
    va$=""
    for j=posiciones(i) to posiciones(i+1)-1
      va$+chr(paquete(j))
    next
    debug va$
  next
  debug "**********"
EndMacro
availabledatastream("empieza1em pieza primeroeeeemppiezasegundoempiezaterceroempiezacuartoempiezaquintoemp")
availabledatastream("iezabloque6empiezaseptimoempiezapaquete numero '8', es decir, el octavo paquete empiezanovenopaqueteempiezadeeeeeecimoempiezaoncepaquete11e")
availabledatastream("mpieza el numero doce 12 yaempiezaeste ya es el numero trece 13em")
availabledatastream("p")
availabledatastream("iez")
availabledatastream("a ... and more garbage: 345yre ")
availabledatastream("398 jwb hiuerhiusdf gsjkebgkñteh89llkdn emp")
availabledatastream("iezeiaqu no hd jdrdt")
availabledatastream("empieza35987yt ib r... until here, just hereempiezay este es el ultimoempieza")
http://www.zeitgeistmovie.com

while (world==business) world+=mafia;