Page 1 of 1

ReadString slowness...

Posted: Wed Sep 29, 2021 3:22 am
by Rinzwind
This can't be intended behaviour?

Code: Select all

EnableExplicit

Procedure.s LoadFile1(Filename.s) ;SLOW!
  Protected r.s
  Protected f = ReadFile(#PB_Any, Filename, #PB_UTF8)
  If f
    r = ReadString(f, #PB_File_IgnoreEOL)
    CloseFile(f)
    ProcedureReturn r
  EndIf
EndProcedure

Procedure.s LoadFile2(Filename.s)
  Protected *p, r.s
  Protected f = ReadFile(#PB_Any, Filename)
  If f
    *p = AllocateMemory(Lof(f), #PB_Memory_NoClear)
    ReadData(f, *p, Lof(f))
    CloseFile(f)    
    r = PeekS(*p, -1, #PB_UTF8)
    FreeMemory(*p)
    ProcedureReturn r
  EndIf
EndProcedure

Define t1, r1, r2, r.s
t1 = ElapsedMilliseconds()
r = LoadFile1("c:\test\test1.html")
r1 = ElapsedMilliseconds() - t1


r = ""
t1 = ElapsedMilliseconds()
r = LoadFile2("c:\test\test1.html")
r2 = ElapsedMilliseconds() - t1

MessageRequester("", Str(r1) + #TAB$ + Str(r2))


File is a HTML frontpage of some site.
---------------------------

---------------------------
295 17
---------------------------
OK
---------------------------

If anything, you would expect ReadString to be faster, since no PeekS and freeing needed, but it's 17 times slower...

Seems #PB_File_IgnoreEOL doesn't make it any faster than reading lines one by one.

Also weird, seems that specifying the size with PeekS is slower than just passing -1.

// Moved from "Bugs - Windows". Slowness is not a bug. (Kiffi)

Re: ReadString slowness...

Posted: Wed Sep 29, 2021 3:28 am
by BarryG
We can't confirm it's a bug without having access to the HTML file to see. For the record, I use ReadString() with #PB_File_IgnoreEOL on large text files and it's pretty much instant.

Re: ReadString slowness...

Posted: Wed Sep 29, 2021 3:43 am
by Rinzwind
Just go to any website like whatever and save as html file. To make good use of utf-8 let's say https://www.thairath.co.th/home
(around 2.6 MB file)

277ms vs 18ms

Re: ReadString slowness...

Posted: Wed Sep 29, 2021 4:14 am
by BarryG
Okay, you're right... but is slow speed a bug, or a feature request? ReadString() and ReadData() probably work very differently behind the scenes.

Re: ReadString slowness...

Posted: Wed Sep 29, 2021 9:16 am
by Fred
Of course, you can't compare a raw read without doing anything, against a small read which create a new string buffer, parse every bytes to detect the end of line etc. ReadString() uses an internal cache which makes it much faster that it was before (pre-4.00 IIRC).

Re: ReadString slowness...

Posted: Wed Sep 29, 2021 9:38 am
by #NULL
@Fred, can you clarify what you mean by that?
Fred wrote: Wed Sep 29, 2021 9:16 am Of course, you can't compare a raw read without doing anything,
That sounds like it should be faster.
against a small read
What do you mean by small read, both read the whole file/bytes, don't they?
which create a new string buffer, parse every bytes to detect the end of line etc.
So why is doing all that stuff faster in the end?
ReadString() uses an internal cache which makes it much faster that it was before (pre-4.00 IIRC).
But what good stuff is it doing that makes it slower than the manual way? Maybe some additional handling of BOM and null bytes etc for correctness?

Re: ReadString slowness...

Posted: Wed Sep 29, 2021 10:21 am
by Fred
Forget what I said, I misread the original post. This particular case is slower, because we don't read the whole file at once, but chunk by chunk. That way we don't have to reserve a massive memory area if the file is very big.

Re: ReadString slowness...

Posted: Wed Sep 29, 2021 10:27 am
by #NULL
That explains it, thanks for clarifying.

Re: ReadString slowness...

Posted: Wed Sep 29, 2021 11:57 am
by Rinzwind
The #PB_File_IgnoreEOL flag specifies to read the whole file at once into memory too. It's not a little bit slower, but 17 times. Would expect similar behavior in that case. I expected it to be the fastest way, because no extra steps needed by programmer. But it's the slowest. Counterintuitive for me at least. So behind the scenes it still reads line by line with #PB_File_IgnoreEOL? That's unnecessary overhead. Anyway, I found it worth mentioning since I found out by chance.