Page 1 of 2

HttpRequest with gzip

Posted: Fri Jan 10, 2025 11:48 am
by Rinzwind
So I can speed things up by using
headers("Accept-Encoding") = "gzip"

However, how do I uncompress the received data from HttpRequest?

Re: HttpRequest with gzip

Posted: Fri Jan 10, 2025 11:59 am
by NicTheQuick
Isn't `HTTPRequestMemory()` doing that for you already?
Do you have an example code that shows the issue?

Re: HttpRequest with gzip

Posted: Fri Jan 10, 2025 12:08 pm
by Rinzwind
No it doesn't. You get gzip binary data back. By default httprequest/httprequestmemory does not ask for gzip compressed. I can't seem to handle the HTTP gzip'ed result with PB.. a shame. Is is quite common by now.

"gzip

A format using the Lempel-Ziv coding (LZ77), with a 32-bit CRC. This is the original format of the UNIX gzip program. The HTTP/1.1 standard also recommends that the servers supporting this content-encoding should recognize x-gzip as an alias, for compatibility purposes.
"

BriefLZ: "BriefLZ - small fast Lempel-Ziv"

I hoped that would be compatible... it doesn't seem to be however.

Re: HttpRequest with gzip

Posted: Fri Jan 10, 2025 12:23 pm
by Fred
Did you try to uncompress the memory with the zip plugin ?

Re: HttpRequest with gzip

Posted: Fri Jan 10, 2025 2:28 pm
by Rinzwind
Doesn't work.

Raw test code:

Code: Select all

EnableExplicit

UseBriefLZPacker()
UseZipPacker()
UseLZMAPacker()


Define t1 = ElapsedMilliseconds()
Define url.s = "https://raw.githubusercontent.com/json-iterator/test-data/refs/heads/master/large-file.json"
Define NewMap headers.s()
headers("Accept-Encoding") = "gzip"
Debug url
Define req = HTTPRequestMemory(#PB_HTTP_Get, url, 0, 0, 0, headers())
Debug "Timer: " + Str(ElapsedMilliseconds() - t1)


If req
  Define stat.s = HTTPInfo(req, #PB_HTTP_StatusCode)
  Define res.s = HTTPInfo(req, #PB_HTTP_Response)
  Debug Left(res, 100) + "..."
  Define *buf = HTTPMemory(req)
  
;   CreateFile(0, "C:\temp\json.gzip")
;   WriteData(0, *buf, MemorySize(*buf))
;   CloseFile(0)


  Define res2.s = PeekS(*buf, MemorySize(*buf), #PB_UTF8 | #PB_ByteLength)
  Debug Left(res2, 100) + "..."
  Define *buf2 = AllocateMemory(MemorySize(*buf) * 4)

  ;ShowMemoryViewer(*buf, MemorySize(*buf))
  Debug UncompressMemory(*buf, MemorySize(*buf), *buf2, MemorySize(*buf2), #PB_PackerPlugin_Zip)
  res2.s = PeekS(*buf, MemorySize(*buf), #PB_UTF8 | #PB_ByteLength)
  Debug Left(res2, 100) + "..."
  
  FreeMemory(*buf)
  FreeMemory(*buf2)
  
  FinishHTTP(req)
EndIf

ps. Try commenting the gzip line and experience how much slower it can be.

Re: HttpRequest with gzip

Posted: Fri Jan 10, 2025 4:28 pm
by Fred
You code is wrong, you used *buf again instead of *buf2 for second PeekS(). Anwyay it doesn't work beause of missing headers for zip, but libcurl support this natively through a flag, so I guess it could be added:

https://curl.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html

Re: HttpRequest with gzip

Posted: Fri Jan 10, 2025 4:55 pm
by Rinzwind
Fred wrote: Fri Jan 10, 2025 4:28 pm You code is wrong, you used *buf again instead of *buf2 for second PeekS(). Anwyay it doesn't work beause of missing headers for zip, but libcurl support this natively through a flag, so I guess it could be added:

https://curl.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
I was already afraid the quick example code would contain an error. But yes, the issue stands. So please make it so it's a relatively easy improvement. 👍🙏

Ps. 7zip can open the binary file when saved to disk. It is shown as an archive with one file. Could not read it from file with PB. Tried just for fun.

Re: HttpRequest with gzip

Posted: Fri Jan 10, 2025 10:40 pm
by infratec
Example with libcurl:

Code: Select all

EnableExplicit

#LibCurl_ExternalDLL = #True
XIncludeFile "libcurl.pbi"


Define.i curl, res
Define UserData.libcurl_userdata_structure

curl_global_init(#CURL_GLOBAL_DEFAULT)

curl = curl_easy_init();
If curl
  ;curl_easy_setopt_str(curl, #CURLOPT_URL, "https://raw.githubusercontent.com/json-iterator/test-data/refs/heads/master/large-file.json")
  ;curl_easy_setopt(curl, #CURLOPT_SSL_VERIFYPEER, 0)
  ;curl_easy_setopt(curl, #CURLOPT_SSL_VERIFYHOST, 0)
  
  curl_easy_setopt_str(curl, #CURLOPT_URL, "http://httpbin.org/gzip")
  
  curl_easy_setopt_str(curl, #CURLOPT_ACCEPT_ENCODING, "gzip")
  
  curl_easy_setopt(curl, #CURLOPT_WRITEFUNCTION, @LibCurl_WriteFunction())
  curl_easy_setopt(curl, #CURLOPT_WRITEDATA, @UserData)
  
  ; to see that the content was sent with gzip (compare Content-Length and the real size)
  curl_easy_setopt(curl, #CURLOPT_HEADER, 1)
  
  res = curl_easy_perform(curl)
  If res = #CURLE_OK
    If UserData\Memory
      Debug PeekS(UserData\Memory, MemorySize(UserData\Memory), #PB_UTF8|#PB_ByteLength)
      Debug ""
      Debug "unzipped length: " + Str(MemorySize(UserData\Memory))
    EndIf
  Else
    Debug "Error: " + curl_easy_strerror(res)
  EndIf
  
  curl_easy_cleanup(curl)
EndIf

curl_global_cleanup()
You need the external dll, because the internal lbcurl does not include the zip stuff,

I enabled the header, so that you can see that the content was sent as gzip.

Re: HttpRequest with gzip

Posted: Sat Jan 11, 2025 5:14 am
by idle
gzip is just deflate with a header and checksum, why not use deflate?
CompressMemory(*input,len,*output,outputlen,#PB_PackerPlugin_Zip)

Re: HttpRequest with gzip

Posted: Sat Jan 11, 2025 5:24 am
by Rinzwind
idle wrote: Sat Jan 11, 2025 5:14 am gzip is just deflate with a header and checksum, why not use deflate?
CompressMemory(*input,len,*output,outputlen,#PB_PackerPlugin_Zip)
Because it is about handling multiple quite large webserver REST results, and github only supports gzip (requesting deflate will not be honored, stays uncompressed) and gzip seems to be the common standard used anyway.

https://developer.mozilla.org/en-US/doc ... t-Encoding

Re: HttpRequest with gzip

Posted: Sat Jan 11, 2025 6:02 am
by idle
Rinzwind wrote: Sat Jan 11, 2025 5:24 am
idle wrote: Sat Jan 11, 2025 5:14 am gzip is just deflate with a header and checksum, why not use deflate?
CompressMemory(*input,len,*output,outputlen,#PB_PackerPlugin_Zip)
Because it is about handling a bigger webserver REST result, and this one only supports gzip (requesting deflate will bot be honored, stays uncompressed) and gzip seems to be the common standard used.
it would be good to have it added native.

If it's not setting extra data like file name and comment you can just skip the 1st 10 bytes and decompress the data minus the last 8 bytes which is the crc and uncompressed len%$ffffffff

Re: HttpRequest with gzip

Posted: Mon Jan 13, 2025 10:37 pm
by Sergey
Hi, Rinzwind
I adapted Windows code from this forum to your needs,
just test it and correct lines how you want
Please wait for ending, on my PC it took about 18 sec. buffer 1024
Buffer 1024 * 1024 (1 MB) took 0.1 sec. 8)
No need any PB packer's code like ZIP or LZMA

Code: Select all

EnableExplicit

#Z_BUFFER_SIZE = 1024 * 1024 ;- buffer size

#ZLIB_VERSION = "1.2.8"

#Z_OK = 0
#Z_STREAM_END = 1
#Z_FULL_FLUSH = 3
#Z_FINISH = 4

#ENABLE_GZIP = 16

Structure z_stream Align #PB_Structure_AlignC
	*next_in.BYTE
	avail_in.l
	total_in.l ;uLong
	
	*next_out.BYTE
	avail_out.l
	total_out.l ;uLong
	
	*msg.BYTE
	*state
	
	zalloc.i
	zfree.i
	opaque.i
	
	data_type.l
	adler.l ;uLong
	reserved.l ;uLong
			   ;without this, the inflateInit2() fails with a version error
	CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
		alignment.l
	CompilerEndIf
EndStructure

ImportC "zlib.lib"
	inflateInit2_.l(*stream.z_stream, windowBits.l, *version, streamsize.l)
	inflate.l(*stream.z_stream, flush.l)
	inflateEnd.l(*stream.z_stream)
EndImport

Procedure ungzip(*buf)	
	Protected gzip_strm.z_stream, gzip_opaque.l, *gzip_buffer, *gzip_out, gzip_result.l, gzip_unpacked_size.l	
	Protected buf_memory_size = MemorySize(*buf)
	Protected *buf2
	
	If buf_memory_size > 0
		Debug "gzip_packed_size = " + Str(buf_memory_size)
		gzip_strm.z_stream
		gzip_strm\next_in = *buf
		gzip_strm\avail_in = buf_memory_size
		gzip_strm\opaque = @gzip_opaque
		
		inflateInit2_(gzip_strm, 15 | #ENABLE_GZIP, #ZLIB_VERSION, SizeOf(z_stream))
		
		*gzip_buffer = AllocateMemory(#Z_BUFFER_SIZE)
		*gzip_out    = AllocateMemory(#Z_BUFFER_SIZE)
		
		If *gzip_buffer And *gzip_out
			Repeat
				gzip_strm\next_out = *gzip_buffer
				gzip_strm\avail_out = #Z_BUFFER_SIZE
				gzip_result = inflate(gzip_strm, #Z_FULL_FLUSH)
				
				If gzip_result = #Z_OK Or gzip_result = #Z_STREAM_END Or gzip_strm\avail_in = 0
					CopyMemory(*gzip_buffer, *gzip_out + MemorySize(*gzip_out) - #Z_BUFFER_SIZE, #Z_BUFFER_SIZE)
					If gzip_result = #Z_STREAM_END Or gzip_strm\avail_in = 0
						Break
					Else
						*gzip_out = ReAllocateMemory(*gzip_out, MemorySize(*gzip_out) + #Z_BUFFER_SIZE)
					EndIf
				Else
					If gzip_strm\msg
						Debug PeekS(gzip_strm\msg, #PB_UTF8)
					Else
						Debug "gzip_result = " + Str(gzip_result)
					EndIf
					Break
				EndIf
			ForEver
			
			gzip_unpacked_size = gzip_strm\total_out
			If gzip_unpacked_size > 0
				Debug "gzip_unpacked_size = " + Str(gzip_unpacked_size)
				Debug "compression ratio: " + StrF(buf_memory_size * 100 / gzip_unpacked_size, 1) + "%"
				
				*buf2 = AllocateMemory(gzip_unpacked_size)
				CopyMemory(*gzip_out, *buf2, gzip_unpacked_size)
			EndIf
			
			buf_memory_size = gzip_unpacked_size
			
			FreeMemory(*gzip_out)
			FreeMemory(*gzip_buffer)
			
			inflateEnd(gzip_strm)
		EndIf
		
	Else
		Debug "buf_memory_size = 0"
	EndIf
	
	ProcedureReturn *buf2
EndProcedure

Define t1 = ElapsedMilliseconds()
Define url.s = "https://raw.githubusercontent.com/json-iterator/test-data/refs/heads/master/large-file.json"
Define NewMap headers.s()
headers("Accept-Encoding") = "gzip"
Debug url
Define req = HTTPRequestMemory(#PB_HTTP_Get, url, 0, 0, 0, headers())
Debug "HTTPRequest Timer: " + Str(ElapsedMilliseconds() - t1)
If req
	Define stat.s = HTTPInfo(req, #PB_HTTP_StatusCode)
	Define res.s = HTTPInfo(req, #PB_HTTP_Response)
	Debug Left(res, 100) + "..."
	Define *buf = HTTPMemory(req)
	FinishHTTP(req)	
	
	If *buf
		Define res2.s = PeekS(*buf, MemorySize(*buf), #PB_UTF8 | #PB_ByteLength)
		Debug Left(res2, 100) + "..."
		
		Define t2 = ElapsedMilliseconds()
		Define *buf2 = ungzip(*buf)
		Debug "UnGZIP Timer: " + Str(ElapsedMilliseconds() - t2)
		
		If *buf2
			res2.s = PeekS(*buf2, MemorySize(*buf2), #PB_UTF8 | #PB_ByteLength)
			Debug Left(res2, 100) + "..."
		
			FreeMemory(*buf2)
		EndIf
		
		FreeMemory(*buf)
	EndIf
EndIf

Re: HttpRequest with gzip

Posted: Tue Jan 14, 2025 6:58 am
by Rinzwind
Sergey wrote: Mon Jan 13, 2025 10:37 pm
Thanks for a workaround. Seems to work.

Re: HttpRequest with gzip

Posted: Tue Jan 14, 2025 10:08 am
by Fred
It's always amazing to see the alternative then you are all coming with to workaround PB limitation !

Re: HttpRequest with gzip

Posted: Mon Feb 24, 2025 11:07 pm
by matalog
idle wrote: Sat Jan 11, 2025 5:14 am gzip is just deflate with a header and checksum, why not use deflate?
CompressMemory(*input,len,*output,outputlen,#PB_PackerPlugin_Zip)
Idle, can I use what you are describing to ungzip a file using PB's own capabilities, without requiring zlib?