Posted: Wed Jun 08, 2005 4:17 am
ok, having a dual processor machine as my main computer for school and extremely high end imaging, data analysis, and some gaming, the problem i often run into is that an application is NOT optimized for dual processors. while some apps may create somewhat separate threads, the main process often has to "wait" until the separate calculating or manipulating thread finishes before returning the final result.
the best, fastest, most lean and mean code for DNA sequence analysis makes full use of my dual Xeon processors. There are some DNA sequences of about 8000 base pairs in length that i must use regular expressions to search through for enzyme sites (string sequences) anywhere from 4-15 in length that can occur ay any and every place in the string.
so what i do is pass the 8000 character string to a thread procedure, and search for locations each reg exp occurs, store results in it a structured linked list, start a new thread (thread created by a thread) that sorts the result, and re-postmessage_() with the final result. the subclassed listbox i use to display the results processes (peekmessage) the last postmessage_() or sendmessage_() (a message i call #DNA_sequence_done) and adds a listbox entry in real time for each of the linkedlist elements as they are analyzed with the appropriate information stored in the structure of the list.
the key here is the subclassing and getting the heavy working thread working in tandem so that your program execution does not halt or slow.
the commercial program SEQUENCER takes anywhere from 1-2 min to do what my program takes 15-20 sec. ive never tested mine on a single processor machine, so dont think i wrote groundbreaking code. i just know that dual processors (really managed by the OS) can perform this type of task and i wrote my code accordingly.
because i wrote the program in my office at school and made it availabe to my university department, and scientists are using this in competetively government funded research, i need to ask FSU legal if i am allowed to post the source. if i can, ill completely show how to really optimize for dual, hyperthreaded, and dual core, processing.
the best, fastest, most lean and mean code for DNA sequence analysis makes full use of my dual Xeon processors. There are some DNA sequences of about 8000 base pairs in length that i must use regular expressions to search through for enzyme sites (string sequences) anywhere from 4-15 in length that can occur ay any and every place in the string.
so what i do is pass the 8000 character string to a thread procedure, and search for locations each reg exp occurs, store results in it a structured linked list, start a new thread (thread created by a thread) that sorts the result, and re-postmessage_() with the final result. the subclassed listbox i use to display the results processes (peekmessage) the last postmessage_() or sendmessage_() (a message i call #DNA_sequence_done) and adds a listbox entry in real time for each of the linkedlist elements as they are analyzed with the appropriate information stored in the structure of the list.
the key here is the subclassing and getting the heavy working thread working in tandem so that your program execution does not halt or slow.
the commercial program SEQUENCER takes anywhere from 1-2 min to do what my program takes 15-20 sec. ive never tested mine on a single processor machine, so dont think i wrote groundbreaking code. i just know that dual processors (really managed by the OS) can perform this type of task and i wrote my code accordingly.
because i wrote the program in my office at school and made it availabe to my university department, and scientists are using this in competetively government funded research, i need to ask FSU legal if i am allowed to post the source. if i can, ill completely show how to really optimize for dual, hyperthreaded, and dual core, processing.