Database implementation/organization

Everything else that doesn't fall into one of the other PB categories.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

But once you split it phrase search breaks. I can no more search for "vocal band" because the order is lost.
Foz
Addict
Addict
Posts: 1359
Joined: Tue Nov 13, 2007 12:42 pm
Location: Manchester, UK

Post by Foz »

when you do a search for vocal band, you do the same process and split the text, and then in the IN () you just have IN ("vocal", "band")

This would return then "Male Vocal Band" and "Vocal Politician Versus Band Of Rioters".

This is the price of searching on a many to many list. :)
As a post process you can then "rank" the results, so words that are next to each other in the correct order have a higher ranking.

Of course if you are focussing on windows, get SQL Server Developer Edition (the enterprise edition without restrictions) and then look up "Full Text Search" which does what I'm talking about within it's own database :)
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

Funny. For a while the crowd was fooling around with optimization, and lately it's all databases :-)
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Foz wrote:when you do a search for vocal band, you do the same process and split the text, and then in the IN () you just have IN ("vocal", "band")

This would return then "Male Vocal Band" and "Vocal Politician Versus Band Of Rioters".
I'm not talking about a search for vocal band, I'm talking about searching for "vocal band" (in quotes). It should not return Vocal Politician Versus Band Of Rioters".
User avatar
Rook Zimbabwe
Addict
Addict
Posts: 4322
Joined: Tue Jan 02, 2007 8:16 pm
Location: Cypress TX
Contact:

Post by Rook Zimbabwe »

I didn't know it was music... I could rewrite that DB for that... but... I just had an idea.

Why don't you use XML to organize the listings?

That way (allegedly) that data would be useable anywhere!
Binarily speaking... it takes 10 to Tango!!!

Image
http://www.bluemesapc.com/
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Why don't you use XML to organize the listings?
Because I want something fast for several thousand files queried with different queries all the time. XML will be slow.
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

Yes, never really been convinced by xml myself. Sure I can see the obvious advantages; especially when dealing with small amounts of data, but when dealing with large amounts..... Oh well, maybe it's just me? :)
I may look like a mule, but I'm not a complete ass.
User avatar
Rook Zimbabwe
Addict
Addict
Posts: 4322
Joined: Tue Jan 02, 2007 8:16 pm
Location: Cypress TX
Contact:

Post by Rook Zimbabwe »

Houston Independant School District uses XML in ALL of its schools ((497+)) to manage ALL student data for about 315,000+ students.

It works fast enough... it is simple ASCII data! :D
Binarily speaking... it takes 10 to Tango!!!

Image
http://www.bluemesapc.com/
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Rook Zimbabwe wrote:Houston Independant School District uses XML in ALL of its schools ((497+)) to manage ALL student data for about 315,000+ students.

It works fast enough... it is simple ASCII data! :D
I bet they don't do full text search on that data.
User avatar
Rook Zimbabwe
Addict
Addict
Posts: 4322
Joined: Tue Jan 02, 2007 8:16 pm
Location: Cypress TX
Contact:

Post by Rook Zimbabwe »

nope but I do know that before I left there were over 114 pieces of data for each child or varying lengths...

And yes... Data Migration was handled on a MASSIVE scale several times a year.

Took less time than you would think!

But you will probably use unicode because of the fonts (you know, special letter characters)... I have no idea what XML does in unicode.

As well... There are many things I DON'T like about XML... it is sloppy to my eyes and always has been, but it IS easy to deal with in a webpage! :D
Binarily speaking... it takes 10 to Tango!!!

Image
http://www.bluemesapc.com/
Foz
Addict
Addict
Posts: 1359
Joined: Tue Nov 13, 2007 12:42 pm
Location: Manchester, UK

Post by Foz »

Trond wrote:
Foz wrote:when you do a search for vocal band, you do the same process and split the text, and then in the IN () you just have IN ("vocal", "band")

This would return then "Male Vocal Band" and "Vocal Politician Versus Band Of Rioters".
I'm not talking about a search for vocal band, I'm talking about searching for "vocal band" (in quotes). It should not return Vocal Politician Versus Band Of Rioters".
Well, to take things to the next level, a string could be set up as tags in multiple passes, i.e.
"boy vocal band" is 6 tags:
boy
vocal
band
boy vocal
vocal band
boy vocal band

and when you do a search for "vocal band" you can add a processor to the search builder that looks for quoted strings as is, rather than splitting it down.

so a search for "vocal band" would exclude "Vocal Politician Versus Band Of Rioters", but doing a search for "band" or "vocal" would include it.
DTecMeister
User
User
Posts: 42
Joined: Sat Mar 04, 2006 5:19 pm
Location: Rome, NY

Post by DTecMeister »

Trond,

You may want to try MySQL. It has a free unlimited version. Use the MyISAM table type and use full text index to index the column.
It should be fast enough for you.

The whole key to leveraging database technology is design and index usage.

If you have overlapping inserts/deletes, you will need to use an InnoDB table type and use transactions.

If it's at all possible to build in some delay, you could store all the adds/deletes and run them periodically so indexes can be disabled or removed/added around them.

Jeff
Where are the masses?
User avatar
the.weavster
Addict
Addict
Posts: 1576
Joined: Thu Jul 03, 2003 6:53 pm
Location: England

Post by the.weavster »

srod wrote:Thomas this is a standard trick when using many-many relationships. The important thing is that the LINK table does not have a key-field, or at the very least the two link fields shown are not defined as being key fields or even unique fields etc. :)
Shouldn't the combination of the two fields be the LINK tables primary key?
Trond wrote:Basically I want a searchable index of this, and since like 90% of all songs are about love I figured I'd save some space by not saving the "love" tag so many times. Not to mention artist names and album names will be saved tons of times.

I want to search both the filename as raw text and extra attributes, also as raw text. I thought I could get around the raw text requirement by splitting both the extra attributes and the filename into words and treating them all like short attributes, but then phrase search would break (didn't think of that).
A Firebird VARCHAR field can be up to 32k in size and can be queried using '=', '>', BETWEEN, IN (), case sensitive LIKE and STARTING and case insensitive CONTAINING. They are also RLE compressed (at maximum 128 bytes compresses to 2 bytes).
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

the.weavster wrote:
Trond wrote:Basically I want a searchable index of this, and since like 90% of all songs are about love I figured I'd save some space by not saving the "love" tag so many times. Not to mention artist names and album names will be saved tons of times.

I want to search both the filename as raw text and extra attributes, also as raw text. I thought I could get around the raw text requirement by splitting both the extra attributes and the filename into words and treating them all like short attributes, but then phrase search would break (didn't think of that).
A Firebird VARCHAR field can be up to 32k in size and can be queried using '=', '>', BETWEEN, IN (), case sensitive LIKE and STARTING and case insensitive CONTAINING. They are also RLE compressed (at maximum 128 bytes compresses to 2 bytes).
But will it be fast for thousands of entries?
User avatar
the.weavster
Addict
Addict
Posts: 1576
Joined: Thu Jul 03, 2003 6:53 pm
Location: England

Post by the.weavster »

In my experience queries that search, sort or join on unindexed fields or queries that return tens of thousands of records in a recordset are what slow databases down rather than a query that selects a few records from a very large table.

It's one of those 'how long is a piece of string' type questions, it really depends what you're going to ask the server to fetch for you.
Post Reply