Page 1 of 1
Parallel Programming
Posted: Sun Feb 10, 2008 11:38 pm
by hellhound66
Closed.
Posted: Tue Feb 12, 2008 9:49 am
by dell_jockey
Moin HellHound,
thanks a lot for this code. Apparently, this method doesn't scale up very efficiently beyond two cores, see the results I got on my quad-core machine:
1 thread: 1566 iterations
2 threads: 3249 iterations
3 threads: 3295 iterations
4 threads: 3357 iterations
I've repeated this multiple times, (compiled as a console app on WinXP-Pro, no debugger, no other app running) each time getting very similar results.
Posted: Tue Feb 12, 2008 5:43 pm
by hellhound66
Thank you for testing.
Strange indeed. I'll check the code again. I will assign each thread to one core exclusively.
Are you sure you have a quadcore?
/edit:
Seems to be the worker tasks I used. Please check it out in 5 minutes again ^__^.
Posted: Tue Feb 12, 2008 9:53 pm
by blueznl
On my quad core WITH DEBUGGER OFF:
Code: Select all
1 thread 849
2 threads 1444
3 threads 1666
4 threads 3066
Funny behaviour... Speed increase from two to three threads was minimal, however the jump to four threads was outright impressive.
Hellhound, what kind of hardware do you use?
Posted: Tue Feb 12, 2008 10:27 pm
by hellhound66
I use a dual core (Core2Duo 2.0GHz).
And it's no wonder, that I get results like these:
1 thread 701
2 threads 1307
3 threads 1378
4 threads 1358
The behaviour with three and four threads is easy to explain.
I push all tasks in a Queue. Then I start all threads. The first thread ist now taking the First entry in the queue, the second thread the second element in queue and so on. If one thread finished the task, it takes the next task in queue and so on.
In the example I put 4 tasks in the queue. The tasks are identical so they take all (nearly) the same time. Now 3 threads start working. Now, when one thread finished work, it takes the last task. The two other threads are inactive in this time. So it is logical, that there is no big difference on a quadcore, between 2 and 3 threads. Only 4 threads can do all the work in one cycle.
The big deal in this programming is to split up the tasks in many subtasks, that can be processed parallel. The more tasks you have in queue, the better a multicore CPU can work.
Posted: Tue Feb 12, 2008 10:32 pm
by dell_jockey
hellhound66 wrote:
Are you sure you have a quadcore?
yes, Dell Precision T3400 quadcore...
with the current code I get the following:
1 thread 911 iterations
2 threads 1630 iterations
3 threads 1940 iterations
4 threads 3322 iterations
compare that to my prior test - odd, isn't it? Especially the fact that a single thread is now so much slower makes me wonder...
Posted: Tue Feb 12, 2008 10:43 pm
by hellhound66
Seems to be the worker tasks I used. Please check it out in 5 minutes again ^__^.
I changed the worker tasks. That's why it is "slower" know (other tasks).
/edit:
Feel free to change the task queue yourself, try it out. If there are some bugs, this is the best way to find out.
You can use different tasks as well, and there you can see that the order of tasks is important, too.
Posted: Tue Feb 19, 2008 1:17 am
by DoubleDutch
I think there may be some kind of bug when you have more than 4 cores (in a dual quad-core system = 8 cores)...
Starting process with 1 thread(s)
Iterations : 966
Starting process with 2 thread(s)
Iterations : 1873
Starting process with 3 thread(s)
Iterations : 1941
Starting process with 4 thread(s)
Iterations : 3497
Autodetecting cores..
Starting process with 8 thread(s)
Iterations : 3478
?
Looking at resources allocated overview, only 50% is used when at the 8 thread iteration - so it isn't using the cores on the 2nd cpu for some reason?
Posted: Tue Feb 19, 2008 1:34 am
by DoubleDutch
If I change the "Game" procedure to this:
Code: Select all
Procedure Game(Cores)
CreateThreads(Cores)
PrintN("Starting process with "+Str(ThreadCount)+" thread(s)")
time.l = ElapsedMilliseconds()
Repeat
;All tasks in the queue..
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
; Process all
ProcessAll()
; The next tasks in the queue
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
AddWorkerQueueElement(@Dummy())
; Process both.
ProcessAll()
Counter+1
Until ElapsedMilliseconds()-time>4000 ; 2 sec.
PrintN("Iterations : "+Str(Counter))
Counter = 0
EndThreads()
EndProcedure
I get:
Starting process with 1 thread(s)
Iterations : 489
Starting process with 2 thread(s)
Iterations : 960
Starting process with 3 thread(s)
Iterations : 1295
Starting process with 4 thread(s)
Iterations : 1858
Autodetecting cores..
Starting process with 8 thread(s)
Iterations : 3277
Which looks right, a quick check with the resource monitor shows that on 4 threads it's 48% used and on the 8 threads its 90% used.
