Readspeed Test v2.10 Test HD/OS and chunksize speeds.

Post by **Fred** » Mon May 25, 2009 12:31 am

I tough you were testing which value for SetFileBuffer would fit best, i read too fast.

Rescator · Post by **Rescator** » Mon May 25, 2009 2:23 am

Heh! That's ok, we know how exhausted you must be working on 4.40

v2.4 source is up, new charts in first post as well.

I know I said that 2.4 would not have any dramatic changes.
Well, it seems it does, dramatic in that the results are less dramatic.
let me explain.

Previous versions in sequential mode read through the file using one chunksize,
then it read through the file again with the next chunksize,
when all sizes had been tested it moved to the next file (if more than 1 file).
In random mode it read randomly from a file with a chunksize,
then it read randomly through the same file using the next chunksize,
when all sizes had been tested it moved to the next file (if more than 1 file).

Now v2.4 does things just slightly differently.
It will in sequential mode read through the file with the chunksize,
then it will change to the next file and read through that,
when all files (if more than 1 file) are done it moves back to the first file and tests with the next chunksize.
In random mode it will randomly read 500 times from the first file with the chunksize,
then it will go to the next file and do the same,
when done with all files it goes to the first file again and start testing with the next chunksize.

Apparently this little change in behavior seems to have nuked my Vista's filesystem ability of caching files.

So now this test tool basically gives a worst case scenario, especially when it comes to random access, and the sequential is a much less dramatic difference between chunksizes than previously.
Now neither big nor small chunksizes have an advantage any longer as the filesystem cache is now almost useless, so it is now a level playing field in other words.

Hmm, wasn't it so that PureBasic's filebuffer default was 4096 bytes?
*looks at the charts* Well what do you know, seems like Fred's default is pretty clever after all.

I'm still curious as to why I got a spike at 128K in sequential,
but considering that it's not that fast in random it could be just a behaviour of my HD.

I now consider this finished.
It now does exactly what I wanted, to find out what chunksize (blocksize/buffersize) is the fastest.

However, if anyone wants to modify this code go ahead: (Please name the tool something else and change the name in the chart url so people see the charts are about different things)

To test how much overhead FileBuffersSize() have simply change the code FileBuffersSize(#File1,0) to FileBuffersSize(#File1,chunksize)

To test buffering set it to FileBuffersSize(#File1,XXXXX) where XXXXX is a fixed buffer size, curious how PureBasic's filebuffer behaves if it's 1024 bytes and you try to randomly feed 512B to 16MB chunks of data through it? Well now you know how to test that

And to test datarate (in random mode that is, as sequential currently doubles as a datarate test too), well that needs some extra coding from you folks.
The tool was designed to find the fastest chunksize, to find the fastest datarate size you need to redo some of the code so that each chunksize reads the same amount of data relative to each other.
This means that each chunksize need different number of loops so that for example each chunksize test in total reads 16MB of data.
If I where to guess I'd say such a chart would look not that different from the sequential chart right now.

Have fun folks.

Hi-Toro · Post by **Hi-Toro** » Mon May 25, 2009 6:22 pm

I haven't tried this yet, sorry, but while writing a duplicate file finder some time ago (ie. needing to read thousands of files, potentially gigabytes in size), I found 64K to be the optimum chunk size to read, though I didn't know about PB's own file buffers at the time!

Coincidentally, 64K is the value returned by Windows XP on my 32-bit system for this code, which basically tells you the chunk size for any memory allocation (eg. if the 'dwAllocationGranularity' size below is 64K and you request 1 byte, 64K is allocated; if you request 65K, 128K is allocated, etc)...

Code: Select all

GetSystemInfo_ (info.SYSTEM_INFO)
Debug info\dwAllocationGranularity

... so I'm wondering if perhaps the result from this call matches what people are seeing individually, eg. perhaps 64-bit Windows returns 128K, hence that's the fastest for some people?

More here: http://msdn.microsoft.com/en-us/library/aa450921.aspx

dwAllocationGranularity
The granularity with which virtual memory is allocated.
For example, a VirtualAlloc request to allocate 1 byte will reserve an address space of dwAllocationGranularity bytes.

This value was hard coded as 64 KB in the past, but other hardware architectures may require different values.

Hi-Toro · Post by **Hi-Toro** » Sun Jun 14, 2009 1:11 am

I was a little surprised by the sudden death of this thread, given that it was so regularly updated! Did anyone test my theory on 64-bit Windows?

Rescator · Post by **Rescator** » Sun Jun 14, 2009 5:10 am

Oh yeah, I forgot to post. It was 64KB here too.
It's probably a "future feature", there are quite a few of them in Windows, that might seem useless or redundant today but will be very useful later or in certain situations.
WinCE and Win Mobile probably use a different value for example. (just a wild guess).

Reason memory stuff may be faster on x64 is because the system (or at least the CPU) can handle twice as much memory/data at the same speed as x86, provided that memory bottlenecks aren't slowing things down that is. (another assumption)

Rescator · Post by **Rescator** » Sat Aug 22, 2009 6:51 am

Been a while, but still interesting:
Roy Longbottom has been doing performance tests for years, by the looks of his site calling him an expert might be an understatement.
http://www.roylongbottom.org.uk/diskgraf%20results.htm

I found the disk results interesting, in particular the large bunch of read/write/dma tests further down that page.
I normalized all that data, split out the CPU info.
And I ended up with:

Code: Select all

Speed MB/s (higher is better)
   1KB=20.97%
   2KB=35.34%
   4KB=52.02%
   8KB=63.78%
  16KB=77.99%
  32KB=91.66%
  64KB=100.00%
 128KB=107.54%
 256KB=112.53%
 512KB=115.84%
1024KB=119.00%

CPU load (lower is better)
 1KB=100.00%
 2KB=99.39%
 4KB=82.55%
 8KB=63.57%
16KB=47.14%
32KB=36.03%
64KB=29.15%
no idea why it stops here, maybe he didn't do CPU data for higher or higher was not that much different fro 64?

Now, it's obvious that 64KB is twice as much memory as 32KB.
So if we imagine the following:

Code: Select all

 1KB=100.0% (higher is better)
 2KB=50.0%
 4KB=25.0%
 8KB=12.5%
16KB=6.25%
32KB=3.125%
64KB=1.5625%

And flip the CPU load on it's head (by recalculating the whole thing using 64KB as base):

Code: Select all

CPU load (higher is better)
 1KB=29.15%
 2KB=29.35%
 4KB=35.31%
 8KB=45.85%
16KB=61.83%
32KB=80.89%
64KB=100.00%

Sum each of the speed, cpu and memory percentage and divide by 3,
and thus we get the following final result:

Code: Select all

 1KB=49.67%
 2KB=38.00%
 4KB=37.33%
 8KB=40.17%
16KB=48.08%
32KB=58.04%
64KB=67.19%

Yeah, the memory use caused the whole thing to kinda even out, but is mathematically correct.
I'm kinda surprised that that 64KB turns out to be such a solid allrounder.

It's unfortunate there was no CPU data for 128KB and above as it's impossible to extrapolate from the final result above as for all we know the CPU use is identical, in which case any speed benefits would be killed by the memory drawback.
Especially as we see that from 32KB the MB/s really starts to flatten out, while the memory % would keep doubling, so 128KB could even end up worse off in the final result than 16KB ?!

Anyway, if you look at all his other various tests, including random memory and random file reads and even CD/DVD reading and writing, and network and USB disks and memcards, you will see that 64KB turns out to be a good CPU/MB/Size compromise and and it's results in various tests are one of the more consistent. (less highs and lows than others)
His tests include Win 9x up to Vista, old 486 systems up to 64bit CPU's, so it's surprisingly imperical, much more so than I could achieve.

* goes to set his filebuffer default to 64KB using the new FileBuffersSize(#PB_Default,65536)

*

EDIT: Used the wrong CPU sums, corrected.

Rescator · Post by **Rescator** » Sat Aug 22, 2009 8:11 am

Used the wrong CPU sums, corrected.

I wonder if some modifier should be applied as the memory has 1/3 weight which skews the results too much, after all, speed and CPU is more important than memory right? In fact, CPU should be higher pri.
In which case 64KB would come out even better, and 1KB a lot worse than the final result is right now.

EDIT:
If CPU was given 100% priority, speed given 50% priority and memory 25% priority then the results would be aprox this I think (didn't calculate to hard on this particular one):
1KB=36.57%
2KB=33.71%
4KB=38.43%
8KB=45.50%
16KB=57.75%
32KB=72.16%
64KB=85.94%

Rescator · Post by **Rescator** » Wed Dec 17, 2014 11:09 am

Redid the code, now the Random seek and Sequential read are both done (no more selection on which to run), random seek will now also read the same amount of data as the sequential read.

This does mean tests will take longer, but I also removed the smallest chunksizes and the largest ones (v2.5 I was working on had 512B to 256MB chunksizes) so now only sizes from 16KB to 4MB are tested, a total of 9 different tests, times two (random seek+read and sequential read with no seek).

Do note that a lot can influence the results and you may find that testing with the same files again may cause one test to cause a particular chunksize to be very slow or fast, if you see such anomalies you may want to re-run your test until you get consistent results.

Anomalies are unfortunate and do happen but are impossible to optimize against. If you want to know a good size to use then pick the fastest one from multiple tests on multiple drives and different file collections.
Myself I've found my system seems to favor 128KB and 256K with 128KB being the more consistent one.

Rescator · Post by **Rescator** » Sat Dec 20, 2014 6:18 am

Improved the text in the message requester a little, example:

Code: Select all

Readspeed Test v2.7

The baseline is the slowest chunksize, and shown as 100.00 %
Based on 40 files and 1971.88 MB of data (processed 35493.81).
Test took 4.167 minutes, speed was 141.98 MB/s (1135.80 mbit/s).

16384 chunksize speed = 100.00 %
32768 chunksize speed = 8212.85 %
65536 chunksize speed = 9774.52 %
131072 chunksize speed = 10796.36 %
262144 chunksize speed = 10857.99 %
524288 chunksize speed = 10705.21 %
1048576 chunksize speed = 10904.68 %
2097152 chunksize speed = 10695.17 %
4194304 chunksize speed = 8453.15 %

Code: Select all

Readspeed Test v2.7
---------------------------
The baseline is the slowest chunksize, and shown as 100.00 %
Based on 40 files and 1971.88 MB of data (processed 35493.81).
Test took 0.400 minutes, speed was 1478.91 MB/s (11831.27 mbit/s).

16384 chunksize speed = 100.00 %
32768 chunksize speed = 124.15 %
65536 chunksize speed = 146.89 %
131072 chunksize speed = 163.44 %
262144 chunksize speed = 165.81 %
524288 chunksize speed = 161.43 %
1048576 chunksize speed = 165.41 %
2097152 chunksize speed = 163.36 %
4194304 chunksize speed = 126.58 %

There is a reason why I show you both of these.
The first is the first run on the test files, the second is the same files but here you obviously see something odd is going on.
Windows filecaching is really aggressive here, it cached 40 files totaling 1.9GB, the second run never fetched the files from the USB stick I was testing here, only reason I noticed was due to the suddenly faster test time and the light in the USB stick not flashing at all (as it should during access).

Now I don't really mind this actually as during normal circumstances one would want this active and (if I where testing the speed of a device I'D want to caching except whatever buffer might be in the device itself),
but the Readspeed Test program here is testing to see which chunksize is best to use.

The issue though is that you have to unplug the USB stick to get a uncached test if you intend to run the test multiple times.
This also means the results of the multiple passes is probably only correct for the very first pass.

I'll see about making a new version later that is able to test with both Windows file caching and without and take the average of both (to get uncached and cached reads of the same file) which is similar to some webspeed tests that tests uncached and cached reads from websites.

I'm also surprised how large the windows filecache is (is it large on Mac and Linux as well?), I wonder if it's even larger on systems with a lot of "unused" memory?

Post by **Fred** » Sat Dec 20, 2014 9:33 am

IIRC Windows uses its whole free memory as disk cache. It's shadow mem, which means it's not visible to taskmanager or such. It gets automatically released when an app claims memory.

luis · Post by **luis** » Sat Dec 20, 2014 6:56 pm

Yep, it's reported by taskmanager as "Free".
When more ram is required by a program, the data in the cache is simply thrown away, unless it is write-back cache.
In that case is written back on disk but usually this happens before since the cache manager runs a process called "lazy writer" which once every second (or so) writes back a certain percentage of the data, and does a dynamic fine tuning of the appropriate percentage to be flushed in time to keep the system running smoothly.

Interestingly enough, temporary files are not included in lazy writing operations afaik.

blueznl · Post by **blueznl** » Fri Feb 13, 2015 8:37 am

Systems have become very fast and somewhat unpredictable, so your experience may vary. I haven't experimented much with block size, but I was looking into the impact of (free) memory on performance for an application I worked on some time ago and this might be of interest for some.

Temp folder

On Windows XP and 7 anything in the temp paths seem to be directly written, yep. They are still cached, just seem to be written out asap. I'm clueless why. I can also confirm that on XP and 7 NOT all memory is used as cache. At some point I guess management of cache data becomes more troublesome than it is worth.

I've noticed some disk / memory related issues over time myself...

Impact of memory size

I've been looking into this a few years ago, using a borrowed I7 930 (?) with 12 GB and Windows 7 x64 as well as Windows 7 x86 with some disk intensive proprietary software. The machine had 'triple memory banks' IIRC so results might be a little skewed.

On x64: going down from 12 to 8 made almost no difference. 6 GB was faster than 8 (probably the memory configuration). 4 and 3 GB were a bit slower but better than expected. At 2 GB it showed serious impact.

x64 vs. x86

On x86: 3 GB mem was faster than 4 (probably the memory controller) but not by much. The 2 GB x86 config was a little faster than the 2 GB x64 config (but only very little).

Most of the other machines I fooled around with are older dual channel designs (Core 2 Duo). When running Windows 7 x32 I got better performance at 3 GB as opposed to 2 GB. There was some little gain at 4 GB but as everyone knows Windows 7 x32 won't use part of that extra memory.

From practical experience I'd conclude: put minimal 6 GB in a triple channel machine, or 8 GB in a dual channel machine, and don't worry about it. When running x32 put all you have up to 4 GB in there. Even if you lose dual channel the extra GB helps so 3 is better than 2.

Nothing new in all of this I guess

databases

In general, however, systems have become that fast that I'm not too much bothered anymore about block sizes or memory and the like when it comes to desktop use.

Databases are a whole different beast though... Running a MySQL database on that I7 using an (older) WD Caviar Black (64 MB Cache) vs. a (more recent) Seagate 2 TB drive made a scary difference: the WD was 1.5 to 10 times faster depending on type and amount of transactions, and the SSD was 10 to 50 times faster (probably hitting a CPU limit there).

Taking memory out of the machine decreased performance steadily. 12 > 8 > 6 > 4 > 3 > 2. MySQL was more memory sensitive than my own application.

But... in desktop use I couldn't spot any difference between the faster Caviar Black and the supposedly slower Baracuda! (The Caviar was booting faster, but once up and running the Baracuda seemed a little faster loading larger files.)

More (slower) memory

Laptop chipsets are even weirder. I have an I5 HP laptop with 8 GB mem (4 GB mem internal, 4 GB mem on DIM) with an extra spare memory slot. I put in a SLOWER DIM, expanding memory from 8 GB to 12 GB, and memory speed went UP?!?!? Doubly weird as I think the I5 has dual channel and not triple channel.

Triggered by your post I just tried the same application (I tested on that I7 back when) on this laptop...

On that I7 I could see little difference between 12, 8 and 6 GB, and only a little when going down to 4. On this laptop there is again little difference between 12 and 8, but at 4 GB it suddenly became dogged slow. (Poor drivers? Too many memory consuming applications in the background?) Pretty much the same Windows 7 x64 setup, but on totally different hardware.

Windows update and a second harddisk

Side note: from real world experience I know Windows Update is suddenly a lot faster if a. there is a second harddisk, and b. the temp folders are on that second harddisk. If you don't believe me just do a clean pre-SP1 Windows 7 install and start updating. I've done that too many times... When your first drive is an SSD don't bother

PureBasic Forums - English

Readspeed Test v2.10 Test HD/OS and chunksize speeds.

Re: Readspeed Test v2.6 Test HD/OS and chunksize speeds.

Re: Readspeed Test v2.7 Test HD/OS and chunksize speeds.

Re: Readspeed Test v2.10 Test HD/OS and chunksize speeds.

Re: Readspeed Test v2.10 Test HD/OS and chunksize speeds.

Re: Readspeed Test v2.10 Test HD/OS and chunksize speeds.