You are doing excellent work, GedB. But I'm not sure I can agree that this is the simplest solution. The thing is, this approach requires a preparatory step by the program to prepare the data for access. I'd like to avoid that if I can. I want to create a dll file with one function only that is usable by any program in any language. Using an indexed approach requires two functions, one to call after initializing the dll, the prep one that gets the index going, and one for wordsearch. Your approach would be ideal for a program that is including the data into itself, but for a dll distribution I think this might be better:And finally, the simplest solution.
The main problem is that the file is too large because of trailing spaces in the words. But what if we just took them all out and left only the 0's to delimit the words? The binary search wouldn't work. As written it wouldn't work. But it's lightning fast right now, capable of searching thousands of words in a few milliseconds. A little fat added to it wouldn't make a noticeable difference at all. So, why not, instead of finding the middle word in the list, just find the middle of the list bytewise and then back up 'til we hit a null? Then do a PeekS(), call that the middle word and proceed as normal? The performance hit would be finding the starts of the words and, because we aren't dividing the words exactly in half, an extra loop or two might be performed. Then it might take 15 or 20 milliseconds to search 1000 words instead of 10 but optimum size is achieved and there is still just the one single function. To me that seems simplest. What do you think?
Btw, many people are downloading those text files from my site. At least a dozen a day average. I have no idea where they are getting the link because when I google TWL98 or SOWPODS my site doesn't come up. SCRABBLE either. I'm mystified on that one.