Page 1 of 1

Optimizing findstring

Posted: Wed Dec 15, 2010 4:02 pm
by buzzqw
Hi all!

first of all i must admin that i am not a programmer in any way! but i enjoying using purebasic for automatizing simply task

i have to check 2 cvs of several thousand hundred lines, i must read a line in file A and check in file B if i found/not found some parameters

this is a snippet of code

Code: Select all

ReadFile(888,firstfile.s)
  While Eof(888)=0
    line.s=ReadString(888)
    If StringField(line.s,2,";")="S"
      a.s=StringField(line.s,3,";")
      b.s=StringField(line.s,4,";")
      c.s=StringField(line.s,6,";")
      ReadFile(889,secondfile.s)
      While Eof(889)=0
        secondline.s=ReadString(889)
        If FindString(secondline.s,a.s,0) And FindString(secondline.s,b.s,0) And FindString(secondline.s,c.s,0)
          WriteStringN(890,line.s+" >>> "+secondline.s)
        EndIf
      Wend
      CloseFile(889)
    EndIf
  Wend
  CloseFile(888)
  CloseFile(890)
    
i have a very strong feeling that i wrote a very bad routine (but working.. slow)

any help from professional developers ? :oops:

Thanks!

BHH

Re: Optimizing findstring

Posted: Wed Dec 15, 2010 4:05 pm
by IdeasVacuum

Re: Optimizing findstring

Posted: Wed Dec 15, 2010 4:31 pm
by buzzqw
very fascinating ... but over my understanding

Code: Select all

Procedure TUNEDBM(Start.l,Array Source.a(1), Array Pattern.a(1))
   Protected.i i, j, m, n, k, match, shift
   Protected.i Dim bmBc(255)   
   m = ArraySize(Pattern())
   n = ArraySize(Source()) 
   ; Preprocessing 
   preBmBc(Pattern(), bmBc())
   shift = Max(bmBc(Pattern(m - 1)),1)
   bmBc(Pattern(m - 1)) = 0
   ; Searching
   j = Max(0,Start-1)
   While j < n
     While j + m < n And bmBc(Source(j + m -1)) <> 0
       j + bmBc(Source(j + m -1))
       If j + m >= n : Break : EndIf
        j + bmBc(Source(j + m -1))
       If j + m >= n : Break : EndIf
        j + bmBc(Source(j + m -1))
       If j + m >= n : Break : EndIf
     Wend
     If CompareMemory(@Source(j), @Pattern(), ArraySize(Pattern()))
       ProcedureReturn j
     EndIf      
     j + shift
   Wend
   ProcedureReturn 0
EndProcedure
how this code is to use in my situation ? :|

BHH

Re: Optimizing findstring

Posted: Wed Dec 15, 2010 4:49 pm
by IdeasVacuum
...that is the search algorithm. On the same page is some code to load a whole file into an array (Array Source). Write your own code that calls TUNEDBM with a pattern (string) to search for (Array Pattern). You can use the same process for both files.

Re: Optimizing findstring

Posted: Wed Dec 15, 2010 4:50 pm
by Trond
You will get more help if you post a working code. Code snippets that don't work as they are posted are annoying to work with.

Re: Optimizing findstring

Posted: Wed Dec 15, 2010 5:02 pm
by buzzqw
here the full code http://www.64k.it/andres/data/Varie/findstring.rar (3kb)

sorry pal but code is in italian

in fist string field you put the first csv
in second string the second csv

"salva log" mean "save log" where analysis file is done
"avvia analisi" mean start analisys

BHH

Re: Optimizing findstring

Posted: Wed Dec 15, 2010 5:36 pm
by Trond
First problem: Procedures that are run as a thread must have exactly one integer parameter:

Code: Select all

Procedure start(nothing.i)
Else your program may crash after a while.

Edit:
The next thing I notice is that you read in your file twice. Performance will be better if you read the files to be compared into two arrays and then use the arrays twice instead of reading the file twice.