Page 1 of 1

[Implemented] More robust Base64

Posted: Thu Jun 30, 2016 2:05 pm
by kenmo
Sorry for long post and nitpicking!
But if the Base64 library is going to be updated for 5.50, here are some improvements to consider :)

1. The Encoder has a flag to remove the trailing '=' padding... but the Decoder cannot handle removed padding!

Code: Select all

Encoded$  = "SGVsbG8gV29ybGQ=" ; Base64("Hello World")
*Input    = Ascii(Encoded$)
InputSize = StringByteLength(Encoded$, #PB_Ascii)

*Output = AllocateMemory(1000)
Debug Base64Decoder(*Input, InputSize, *Output, 1000)
Debug PeekS(*Output, -1, #PB_Ascii)

Code: Select all

Encoded$  = "SGVsbG8gV29ybGQ" ; Base64("Hello World") without padding
*Input    = Ascii(Encoded$)
InputSize = StringByteLength(Encoded$, #PB_Ascii)

*Output = AllocateMemory(1000)
Debug Base64Decoder(*Input, InputSize, *Output, 1000)
Debug PeekS(*Output, -1, #PB_Ascii)
2. The Decoder seems to ignore/exceed the specified OutputSize! (Shouldn't it fail and return 0?)

Code: Select all

Encoded$  = "SGVsbG8gV29ybGQ=" ; Base64("Hello World")
*Input    = Ascii(Encoded$)
InputSize = StringByteLength(Encoded$, #PB_Ascii)

*Output = AllocateMemory(5) ; buffer too small!
Debug Base64Decoder(*Input, InputSize, *Output, 5)
Debug PeekS(*Output, -1, #PB_Ascii)

FreeMemory(*Output)
FreeMemory(*Input)
3. The Encoder also ignores/exceeds the specified OutputSize. (Shouldn't it fail and return 0?)

Code: Select all

Text$     = "Hello World"
*Input    = Ascii(Text$)
InputSize = StringByteLength(Text$, #PB_Ascii)

*Output = AllocateMemory(15) ; buffer too small! should be 17 (incl NUL)
Debug Base64Encoder(*Input, InputSize, *Output, 15)
Debug PeekS(*Output, -1, #PB_Ascii)

FreeMemory(*Output)
FreeMemory(*Input)

Re: [PB 5.50b1] More robust Base64

Posted: Thu Jun 30, 2016 2:17 pm
by kenmo
About the Base64 documentation...

4. The example uses this to calculate the Encoder buffer size:

Code: Select all

Size = (?TestEnd - ?Test) * 1.35
If Size < 64
   Size = 64
EndIf
But even this is not safe, and causes some crashing! Here is the Decoder example, with the InputSize grown from 16 to 64 bytes:

Code: Select all

DataSection
   Test:
   Data.a $00, $01, $02, $03, $04, $05, $06, $07
   Data.a $08, $09, $0A, $0B, $0C, $0D, $0E, $0F
   Data.a $00, $01, $02, $03, $04, $05, $06, $07 ; added
   Data.a $08, $09, $0A, $0B, $0C, $0D, $0E, $0F ;
   Data.a $00, $01, $02, $03, $04, $05, $06, $07 ;
   Data.a $08, $09, $0A, $0B, $0C, $0D, $0E, $0F ;
   Data.a $00, $01, $02, $03, $04, $05, $06, $07 ;
   Data.a $08, $09, $0A, $0B, $0C, $0D, $0E, $0F ;
   TestEnd:
EndDataSection

Size = (?TestEnd - ?Test) * 1.35
If Size < 64
   Size = 64
EndIf

*EncodeBuffer = AllocateMemory(Size)
Size = Base64Encoder(?Test, ?TestEnd - ?Test, *EncodeBuffer, MemorySize(*EncodeBuffer))
Encoded$ = PeekS(*EncodeBuffer, Size, #PB_Ascii)
Debug Encoded$

*DecodeBuffer = AllocateMemory(Size)
Size = PokeS(*EncodeBuffer, Encoded$, StringByteLength(Encoded$, #PB_Ascii), #PB_Ascii|#PB_String_NoZero)
Size = Base64Decoder(*EncodeBuffer, Size, *DecodeBuffer, MemorySize(*DecodeBuffer))
ShowMemoryViewer(*DecodeBuffer, Size)
Safer to use:

Code: Select all

InputSize = (?TestEnd - ?Test)
Size = Round(InputSize/3, #PB_Round_Up) * 4 + 1 ; incl NUL
; or
Size = Int((InputSize + 2)/3) * 4 + 1 ; incl NUL
If Size < 64
   Size = 64
EndIf

5. I think it should be written that the Encoder also appends a NULL terminating byte.
Example: If the function returns 4, it actually wrote 5 bytes (4 ASCII characters plus 1 NULL)

Code: Select all

Text$ = "ABC"
*Input = Ascii(Text$)
InputSize = StringByteLength(Text$, #PB_Ascii)

*Output = AllocateMemory(10+1)
FillMemory(*Output, 10, 'A')
Debug PeekS(*Output, -1, #PB_Ascii)
Debug Base64Encoder(*Input, InputSize, *Output, 11)
Debug PeekS(*Output, -1, #PB_Ascii)
6. The help for Encoder says:
#PB_Cipher_NoPadding: it will not insert additional '=' at the end of the encoded buffer to pad it to 3 bytes boundary.
It should say a 4 byte boundary.

7. Why does the Base64 Encoder/Decoder require a minimum buffer of 64 bytes?
The exact output size (minimum 4 Base64 characters, or even less with NoPadding flag) should be easy to calculate given the input size.
Also, it currently seems to work (maybe not safely?) with less than 64 byte buffers:

Code: Select all

Encoded$  = "SGVsbG8gV29ybGQ=" ; Base64("Hello World")
*Input    = Ascii(Encoded$)
InputSize = StringByteLength(Encoded$, #PB_Ascii)

*Output = AllocateMemory(12)
Debug Base64Decoder(*Input, InputSize, *Output, 12)
Debug PeekS(*Output, -1, #PB_Ascii)

That's all of my concerns :)
Thanks for reading and thanks for all the great PB improvements!

Re: [PB 5.50b1] More robust Base64

Posted: Mon Jul 04, 2016 5:03 am
by Blue
Brilliant -and patient- work, kenmo.

That will definitely help to reduce the stressful work that had to go into figuring out why our AES lines of codes behaved so erratically.

I really hope your very precise input and improvement pointers get the serious consideration they deserve in the next iteration of PB.

Thank you.

Re: [PB 5.50b1] More robust Base64

Posted: Tue Jan 03, 2017 6:02 pm
by IdeasVacuum
I only just found this post and it is a very important one, especially now that PB is Unicode only.

I think this should really be in the bugs section Kenmo. The documentation will potentially lead to unreliable software being created by unwitting devs and the inconsistency of the lib can catch anyone out too.

Disappointing that your comments did not result in an update of the Base64 lib for PB5.51, fingers crossed that it's on Fred's priority list :wink:

Related to this are the AES functions which have also become tricky to implement since PB went Unicode only.