I've done a call count (b). Initially it was b=1500, but when I increased #BUFFER_SIZE 10 times, then the number of calls became 150. Increasing the size even more did not reduce the number of calls.
I guess the standard buffer size is 8192 bytes (8kB) and because of that you'll see no difference in the ReadProgramData() calls if you the define the #BUFFER_SIZE bigger than that.
I read binary and line by line, then compare the results. Differ considerably. 52653 vs 54748 files. Compare in Meld. In the binary version, it is as if part of the data is cut off at each iteration step.
Well, that's bad. I just tried it out with a bigger directory, consisting of 406390 files and directories but Purebasic is so fucking slow at string concatenation that I have to wait minutes before I see any result. I expect the string to be round about 49 MB in size:
Code: Select all
nicolas@tp-w530:~$ find /home/nicolas/Install | wc -c
nicolas@tp-w530:~$ find /home/nicolas/Install | wc -l
I modified my version a bit and now it takes only 1.3 seconds to get the files (with Debugger on!) and the result is correct.
Code: Select all
#BUFFER_SIZE = 8192
Procedure.s getOutput(command.s, args.s)
Protected stdout.s, stderr.s
Protected avail.i, bytesRead.i, bufferSize.i
Protected bufferCapacity.i = #BUFFER_SIZE, *buffer = AllocateMemory(bufferCapacity, #PB_Memory_NoClear), *tBuffer
hProg = RunProgram(command, args, "", #PB_Program_Open | #PB_Program_Read | #PB_Program_Error | #PB_Program_UTF8)
avail = AvailableProgramOutput(hProg)
If avail > bufferCapacity - bufferSize
While avail > bufferCapacity - bufferSize
; Double up the buffer size each time it is too small for new data
bufferCapacity * 2
*tBuffer = ReAllocateMemory(*buffer, bufferCapacity, #PB_Memory_NoClear)
If Not *tBuffer
DebuggerError("Out of memory. Can not allocate " + bufferCapacity + " bytes.")
*buffer = *tBuffer
bytesRead = ReadProgramData(hProg, *buffer + bufferSize, avail)
avail - bytesRead
bufferSize + bytesRead
Until avail = 0
stderr + ReadProgramError(hProg)
stdout = PeekS(*buffer, bufferSize, #PB_ByteLength | #PB_UTF8)
Define time.i = ElapsedMilliseconds()
files = getOutput("/bin/bash", ~"-c \"find /home/nicolas/Install\"")
Debug "Time: " + Str(ElapsedMilliseconds() - time) + " ms"
Debug "Count: " + Str(CountString(files, #LF$))
Debug "Bytes: " + StringByteLength(files, #PB_UTF8)
And the result is:
Debugger wrote:Time: 1371 ms
The bytes length only differs in 2 bytes. I guess that is something `StringByteLength()` does wrong after the string was converted to Unicode in between. If you debug the variable `bufferSize` you will get the exact same amount of data as `wc -c` shows you.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.