Page 1 of 1

Create/Work a database of words

Posted: Fri Apr 16, 2004 1:08 pm
by thyr0x1ne
Well im working since a few days on a medical database for student like me through our intranet chat program ...

I was looking to create a little routine in Purebasic to check for a word in a list ... the list is named definitions.dat , and each line has a structure like this :

"word : definition"

nothing more than a txt file with one definition per line

but as the file is growing , computers are freezing to retrieve if a word allready has a def ; in another way , im looking for a routine in an exe which could do "file.exe -search word" and give the most quickly the line in return to get the definition or commentaries.

Im really bad coder and i think i need some optimisation ; actually the file is 12 Mo big , and freezing computers like hell

have someone an idea of the best commands to put on the routine ? or a way to make this simple search function very quick ?

sorry , im so lost as i see this little definitions/help/commentaries txt growing each day :/

Posted: Fri Apr 16, 2004 1:17 pm
by Kris_a
could you please show us what sort of system you're using so far?

Posted: Fri Apr 16, 2004 1:29 pm
by Dare2
Hi thyr0x1ne,

How many entries in the dictionary file? (Count of Word:Definition entries). How many are there likely to be in a mature file?

Also, is your dictionary organised alphabetically at the moment? Or are elemts ad hoc?

There are different solutions, a lot will depend on size of file. Maybe a database is needed, maybe you can get away with having 26 smaller files for A to Z words, maybe a sorted file and binary search, etc.

Posted: Fri Apr 16, 2004 1:54 pm
by GedB
Save yourself a lot of trouble and put your definitions into a database.

You have a number of options available. You could use PBs built in ODBC capabilities .
http://www.purebasic.com/documentation/database/

You could also use sqllite
http://www.purearea.net/pb/english/index.htm

Posted: Fri Apr 16, 2004 2:19 pm
by dontmailme
GedB wrote:Save yourself a lot of trouble and put your definitions into a database.

You have a number of options available. You could use PBs built in ODBC capabilities .
http://www.purebasic.com/documentation/database/

You could also use sqllite
http://www.purearea.net/pb/english/index.htm
Definately the way to go ;)

Much easier to code for and if you get one which supports concurrent connections then you'll not have that headache either......

Posted: Fri Apr 16, 2004 3:06 pm
by blueznl
need a little more info:

1. do you run the program on each search or is it on all the time

2. do you read from the text file every time or is it okay to load the file one until the program is start again

3. how large do you expect the file to become?

Posted: Fri Apr 16, 2004 3:31 pm
by GedB
If you do want to implement you own fix, the algorithms here will help:

http://www.cs.sunysb.edu/~algorith/file ... ries.shtml

Posted: Fri Apr 16, 2004 4:28 pm
by blueznl
read, split, sort, binairy search

Code: Select all

; DeleteFile("defs.dat")

; RandomSeed(1)

nr = 100000
spread = 10000

Debug "generating"
OpenFile(1,"defs.dat")
For n = 1 To nr
  r = Random(spread)
  WriteString("word"+Str(r)+" : "+"description"+Str(r)+Chr(13)+Chr(10))
Next n
CloseFile(1)

Debug "reading"
Dim x.s(nr)
Dim y.s(nr)
OpenFile(1,"defs.dat")
nr = 0
While Eof(1)=0
  x(nr) = ReadString()
  nr = nr+1
Wend
CloseFile(1)
Debug "read "+Str(nr)

; first sort

Debug "sorting"
SortArray(x(),0,0,nr-1)

; now split

Debug "splitting"
For n = 0 To nr
  p = FindString(x(n),":",1)
  y(n) = Trim(Left(x(n),p-1))
  x(n) = Trim(Mid(x(n),p+1,4096))
Next n

; we're gonna look for?

r = Random(spread)
s.s = "word"+Str(r)

; using binary search

d = nr/2
p = nr/2
f = 0

Debug "looking"
While d>0 And f=0
  d = d/2
  If y(p) > s
    p = p-d
  ElseIf y(p) < s
    p = p+d 
  Else
    f = 1
  EndIf
Wend

If y(p) = s
  Debug "found"
EndIf