Optimizing findstring

Just starting out? Need help? Post your questions and find answers here.
buzzqw
Enthusiast
Enthusiast
Posts: 116
Joined: Sat Aug 27, 2005 10:13 pm
Location: Italy
Contact:

Optimizing findstring

Post by buzzqw »

Hi all!

first of all i must admin that i am not a programmer in any way! but i enjoying using purebasic for automatizing simply task

i have to check 2 cvs of several thousand hundred lines, i must read a line in file A and check in file B if i found/not found some parameters

this is a snippet of code

Code: Select all

ReadFile(888,firstfile.s)
  While Eof(888)=0
    line.s=ReadString(888)
    If StringField(line.s,2,";")="S"
      a.s=StringField(line.s,3,";")
      b.s=StringField(line.s,4,";")
      c.s=StringField(line.s,6,";")
      ReadFile(889,secondfile.s)
      While Eof(889)=0
        secondline.s=ReadString(889)
        If FindString(secondline.s,a.s,0) And FindString(secondline.s,b.s,0) And FindString(secondline.s,c.s,0)
          WriteStringN(890,line.s+" >>> "+secondline.s)
        EndIf
      Wend
      CloseFile(889)
    EndIf
  Wend
  CloseFile(888)
  CloseFile(890)
    
i have a very strong feeling that i wrote a very bad routine (but working.. slow)

any help from professional developers ? :oops:

Thanks!

BHH
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Optimizing findstring

Post by IdeasVacuum »

IdeasVacuum
If it sounds simple, you have not grasped the complexity.
buzzqw
Enthusiast
Enthusiast
Posts: 116
Joined: Sat Aug 27, 2005 10:13 pm
Location: Italy
Contact:

Re: Optimizing findstring

Post by buzzqw »

very fascinating ... but over my understanding

Code: Select all

Procedure TUNEDBM(Start.l,Array Source.a(1), Array Pattern.a(1))
   Protected.i i, j, m, n, k, match, shift
   Protected.i Dim bmBc(255)   
   m = ArraySize(Pattern())
   n = ArraySize(Source()) 
   ; Preprocessing 
   preBmBc(Pattern(), bmBc())
   shift = Max(bmBc(Pattern(m - 1)),1)
   bmBc(Pattern(m - 1)) = 0
   ; Searching
   j = Max(0,Start-1)
   While j < n
     While j + m < n And bmBc(Source(j + m -1)) <> 0
       j + bmBc(Source(j + m -1))
       If j + m >= n : Break : EndIf
        j + bmBc(Source(j + m -1))
       If j + m >= n : Break : EndIf
        j + bmBc(Source(j + m -1))
       If j + m >= n : Break : EndIf
     Wend
     If CompareMemory(@Source(j), @Pattern(), ArraySize(Pattern()))
       ProcedureReturn j
     EndIf      
     j + shift
   Wend
   ProcedureReturn 0
EndProcedure
how this code is to use in my situation ? :|

BHH
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Optimizing findstring

Post by IdeasVacuum »

...that is the search algorithm. On the same page is some code to load a whole file into an array (Array Source). Write your own code that calls TUNEDBM with a pattern (string) to search for (Array Pattern). You can use the same process for both files.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Optimizing findstring

Post by Trond »

You will get more help if you post a working code. Code snippets that don't work as they are posted are annoying to work with.
buzzqw
Enthusiast
Enthusiast
Posts: 116
Joined: Sat Aug 27, 2005 10:13 pm
Location: Italy
Contact:

Re: Optimizing findstring

Post by buzzqw »

here the full code http://www.64k.it/andres/data/Varie/findstring.rar (3kb)

sorry pal but code is in italian

in fist string field you put the first csv
in second string the second csv

"salva log" mean "save log" where analysis file is done
"avvia analisi" mean start analisys

BHH
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Optimizing findstring

Post by Trond »

First problem: Procedures that are run as a thread must have exactly one integer parameter:

Code: Select all

Procedure start(nothing.i)
Else your program may crash after a while.

Edit:
The next thing I notice is that you read in your file twice. Performance will be better if you read the files to be compared into two arrays and then use the arrays twice instead of reading the file twice.
Post Reply