RC4 lib

Share your advanced PureBasic knowledge/code with the community.
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

RC4 lib

Post by Keya »

;### RC4 lib from the WaterJuice CryptLib
;### License: None, public domain
;### Purebasic demo with libs for Windows, OSX, Linux, 32+64

DOWNLOAD (26kb) MIRRORS:
http://www.mediafire.com/download/0d615zdxy5zyurj/rc4lib.zip
https://www.sendspace.com/file/ejvimp
http://s000.tinyupload.com/index.php?file_id=18220965242140535050

hello I was looking at RC4 as I like it because its small and i find it easier to understand and use compared to some like AES which I can spell but that's about it lol, and being a stream cipher instead of block cipher i think its easier to work with! I understand there are some security issues with RC4 so it shouldn't be used for serious security but it still remains useful!

my search showed there are already a couple of RC4's for Purebasic, this is just one more option!
just for completeness I think this is the best PB implementation so far :) - http://www.purebasic.fr/english/viewtopic.php?t=6905

I found this WaterJuice CryptLib library, it is Public Domain and compiles to around ~1-2kb (one of the Windows builds is 920 bytes), and that's the full object file :)
https://github.com/WaterJuice/CryptLib
All we need are the librc4.c and librc4.h files, so it's one of the easier libs to compile! (phew!)

COMPILING:
I've already compiled all the libs (see download above) for all 3 OS, 32+64, and all 4 levels of optimization, but to do it yourself i used the GCC compiler on all 3 OS (for Windows i used TDM-GCC) to build the .o object file:
gcc -c librc4.c
The "-m32" or "-m64" flag can be added to specify machine target.
The "-On" (n=0-3) flag can be used to specify optimization level with 3 the highest, ive included builds at all levels.
GCC provides hundreds of flags for other things like SSE etc but thats out of scope and its all new to me anyway.
I had no compile errors on Windows or OSX. On Linux i initially got an error "sys/cdefs.h: No such file or directory", which after googling i fixed simply with: sudo apt-get install gcc-multilib

DEMO (slightly more updated than the one in the zip due to compiler directives but still functionally equivalent):
Here is my demo attempt, i think i've got it right, hopefully one of the crypto experts here can verify please :)

Code: Select all

#RC4OPT$ = "3"  ;0-3
CompilerIf (#PB_Compiler_OS = #PB_OS_Linux) Or (#PB_Compiler_OS = #PB_OS_Windows And #PB_Compiler_Processor = #PB_Processor_x64)
  #APIPrefix = ""
CompilerElse
  #APIPrefix = "_"  ;some APIs are prefixed with "_", you know how it is!
CompilerEndIf
CompilerSelect #PB_Compiler_OS
  CompilerCase #PB_OS_Windows: #RC4OS$ = "Windows"
  CompilerCase #PB_OS_MacOS: #RC4OS$ = "OSX"
  CompilerCase #PB_OS_Linux: #RC4OS$ = "Linux"
CompilerEndSelect
CompilerSelect #PB_Compiler_Processor
  CompilerCase #PB_Processor_x64: #RC4CPU$ = "64"
  CompilerCase #PB_Processor_x86: #RC4CPU$ = "32"
CompilerEndSelect
#RC4LIB$ = #RC4OS$+"/rc4-"+#RC4CPU$+".O"+#RC4OPT$+".o"

;// Rc4Context - This must be initialised using Rc4Initialised. Do not modify the contents of this structure directly.
Structure RC4Context
  i.l
  j.l
  S.a[256]
EndStructure

ImportC #RC4LIB$    ;"Windows/rc4-32.O3.o" 
  
  ;///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  ;//  Rc4Initialise
  ;//  Initialises an RC4 cipher And discards the specified N number of first bytes.
  ;///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  Rc4Initialise(*Context.RC4Context, *Key, KeySize.l, DropN.l) As #APIPrefix+"Rc4Initialise"
  
  ;///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  ;//  Rc4Output  (Not called in this demo, doesnt seem needed unless you want to make your own version of Rc4Xor)
  ;//  Outputs the requested number of bytes from the RC4 stream
  ;///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  Rc4Output(*Context.Rc4Context, *Buffer, BufSize.l) As #APIPrefix+"Rc4Output"
  
  ;///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  ;//  Rc4Xor
  ;//  XORs the RC4 stream With an input buffer And puts the results in an output buffer. This is used For encrypting
  ;//  And decrypting Data. InBuffer And OutBuffer can point To the same location For inplace encrypting/decrypting
  ;///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  Rc4Xor(*Context.Rc4Context, *InBuffer, *OutBuffer, BufSize.l) As #APIPrefix+"Rc4Xor"  
EndImport


Define ctx.RC4Context
Define keystr.s = "secret key"
Define keylen.i = Len(keystr)
Define *keybuf = AllocateMemory(256)   ;can be any size
Define datstr.s = "my secret message"
Define datlen.i = Len(datstr)
Define *datbuf = AllocateMemory(65535) ;can be any size
PokeS(*keybuf, keystr, -1, #PB_Ascii)
PokeS(*datbuf, datstr, -1, #PB_Ascii)


;// ALICE
;1a. Initialise
Rc4Initialise(ctx,*keybuf,keylen,0)

;2a. Encrypt
Rc4Xor(ctx,*datbuf,*datbuf,datlen)  ;can use the same buffer for inplace encrypting/decrypting
ShowMemoryViewer(*datbuf, datlen)
MessageRequester("RC4","Encrypted")

;//... Alice sends Bob the message ...

;// BOB
;1b. Initialise (now in sync with Alice's stream)
Rc4Initialise(ctx,*keybuf,keylen,0)

;2b Decrypt
Rc4Xor(ctx,*datbuf,*datbuf,datlen)
ShowMemoryViewer(*datbuf, datlen)
MessageRequester("RC4","Decrypted")
Last edited by Keya on Wed Jul 13, 2016 7:10 am, edited 1 time in total.
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: RC4 lib

Post by Kwai chang caine »

I have see your "secret message" and that works great 8)
Just perhaps change the hard path by

Code: Select all

ImportC "Windows\rc4-32.O3.o"
Letter "S" :shock:
I not have enough money for have so much drive :lol:
Thanks a lot for sharing 8)
Last edited by Kwai chang caine on Wed Jul 13, 2016 11:14 am, edited 1 time in total.
ImageThe happiness is a road...
Not a destination
User avatar
blueb
Addict
Addict
Posts: 1112
Joined: Sat Apr 26, 2003 2:15 pm
Location: Cuernavaca, Mexico

Re: RC4 lib

Post by blueb »

Thanks, it compiled with Win 10 Pro (x64) and PB 5.50 b2.

I've used your 'Spritz' version from a year ago, and (to me) it seems easier to use.

Not being an encryption expert... What's the advantage of this version?

blueb
- It was too lonely at the top.

System : PB 6.21(x64) and Win 11 Pro (x64)
Hardware: AMD Ryzen 9 5900X w/64 gigs Ram, AMD RX 6950 XT Graphics w/16gigs Mem
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: RC4 lib

Post by Keya »

Kwai chang caine wrote:Just perhaps change the hard path
I not have enough money for have so much drive :lol:
because of "S:"? lol, I dont have money for drives either especially after being ripped off on cheap USB drives lol. It's just a VirtualBox virtual drive mapping to a tiny drive from within a VM, I promise i don't actually have every letter of the alphabet in use :)
I edited the source in the first post to use conditional compiler directives so it should auto-detect which lib to use depending on your OS.
blueb wrote:Not being an encryption expert... What's the advantage of this version?
i cant really tell you anything apart from what ive read on Wikipedia and saw in the Spritz video sorry (not crypto person here either as im a maths failure but i find secrets fascinating lol), but my understanding is that Spritz is a modification of RC4 by the author of RC4 to address issues found in RC4, but there wasn't much source code available for Spritz being so new and I'm very new to compiling static libs myself, so when I saw this simple RC4 implementation i wanted to test myself to see if i could compile it as a static lib - i think it's fantastic that Purebasic lets us use libs, but im totally intimidated by C compilers so i'm trying to have a go at them to try to conquer my fears![1][2][3]
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: RC4 lib

Post by wilbert »

That's a very compact C source Keya. :)
Did you also look at the asm code the C compiler generates ?
I wonder how optimized the resulting output is.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: RC4 lib

Post by Keya »

lol, some cunning folk have gotten RC4 down to <100 bytes x86, quite amazing!
http://masm32.com/board/index.php?topic=4006.0
https://tinycrypt.wordpress.com/2015/11 ... mentation/
I had a go at one of them trying to port it to inline PB but wasnt successful, it uses a different calling convention for starters but i think i was calling it ok, gave up defeated after a couple hours lol. But i am still happy as this is the first time i've ever compiled a lib for all three OS, just a small personal milestone in my learning

Here is the GCC compile of this C lib (-O3 max optimization, Windows x86), the full .o object file is 991 bytes but im not sure the actual function sizes. <-- [edit] because apparently its too difficult for me to subtract one offset from another, lol. its 158+125=283 bytes
Rc4Initialise(): (gcc -O3) (158 bytes)

Code: Select all

00402000    55                   push ebp
00402001    57                   push edi
00402002    31C0                 xor eax, eax
00402004    56                   push esi
00402005    53                   push ebx
00402006    8B7424 14            mov esi, dword ptr [esp+14]
0040200A    8B6C24 18            mov ebp, dword ptr [esp+18]
0040200E    66:90                nop
00402010    884406 08            mov byte ptr [esi+eax+8], al
00402014    83C0 01              add eax, 1
00402017    3D 00010000          cmp eax, 100
0040201C  ^ 75 F2                jnz short 00402010
0040201E    31C9                 xor ecx, ecx
00402020    31DB                 xor ebx, ebx
00402022    89D8                 mov eax, ebx
00402024    31D2                 xor edx, edx
00402026    0FB67C1E 08          movzx edi, byte ptr [esi+ebx+8]
0040202B    F77424 1C            div dword ptr [esp+1C]
0040202F    89F8                 mov eax, edi
00402031    0FB6C0               movzx eax, al
00402034    0FB65415 00          movzx edx, byte ptr [ebp+edx]
00402039    01D0                 add eax, edx
0040203B    01C1                 add ecx, eax
0040203D    0FB6C9               movzx ecx, cl
00402040    0FB6440E 08          movzx eax, byte ptr [esi+ecx+8]
00402045    88441E 08            mov byte ptr [esi+ebx+8], al
00402049    83C3 01              add ebx, 1
0040204C    89F8                 mov eax, edi
0040204E    81FB 00010000        cmp ebx, 100
00402054    88440E 08            mov byte ptr [esi+ecx+8], al
00402058  ^ 75 C8                jnz short 00402022
0040205A    8B4424 20            mov eax, dword ptr [esp+20]
0040205E    85C0                 test eax, eax
00402060    74 3C                je short 0040209E
00402062    8B6C24 20            mov ebp, dword ptr [esp+20]
00402066    66:31DB              xor bx, bx
00402069    31C0                 xor eax, eax
0040206B    31C9                 xor ecx, ecx
0040206D    89DF                 mov edi, ebx
0040206F    90                   nop
00402070    83C1 01              add ecx, 1
00402073    83C7 01              add edi, 1
00402076    0FB6C9               movzx ecx, cl
00402079    0FB6540E 08          movzx edx, byte ptr [esi+ecx+8]
0040207E    01D0                 add eax, edx
00402080    39EF                 cmp edi, ebp
00402082    0FB6C0               movzx eax, al
00402085    0FB65C06 08          movzx ebx, byte ptr [esi+eax+8]
0040208A    885C0E 08            mov byte ptr [esi+ecx+8], bl
0040208E    885406 08            mov byte ptr [esi+eax+8], dl
00402092  ^ 75 DC                jnz short 00402070
00402094    890E                 mov dword ptr [esi], ecx
00402096    8946 04              mov dword ptr [esi+4], eax
00402099    5B                   pop ebx
0040209A    5E                   pop esi
0040209B    5F                   pop edi
0040209C    5D                   pop ebp
0040209D    C3                   retn
Rc4Xor(): (gcc -O3) (125 bytes)

Code: Select all

00402130    55                             push ebp
00402131    57                             push edi
00402132    56                             push esi
00402133    53                             push ebx
00402134    31F6                           xor esi, esi
00402136    83EC 04                        sub esp, 4
00402139    8B5424 24                      mov edx, dword ptr [esp+24]
0040213D    8B4424 18                      mov eax, dword ptr [esp+18]
00402141    8B6C24 1C                      mov ebp, dword ptr [esp+1C]
00402145    85D2                           test edx, edx
00402147    74 5C                          je short 004021A5
00402149    8DB426 00000000                lea esi, dword ptr [esi]
00402150    8B38                           mov edi, dword ptr [eax]
00402152    8D57 01                        lea edx, dword ptr [edi+1]
00402155    0FB6D2                         movzx edx, dl
00402158    0FB67C10 08                    movzx edi, byte ptr [eax+edx+8]
0040215D    8910                           mov dword ptr [eax], edx
0040215F    89FB                           mov ebx, edi
00402161    0FB6CB                         movzx ecx, bl
00402164    8B58 04                        mov ebx, dword ptr [eax+4]
00402167    890C24                         mov dword ptr [esp], ecx
0040216A    01CB                           add ebx, ecx
0040216C    0FB6DB                         movzx ebx, bl
0040216F    0FB64C18 08                    movzx ecx, byte ptr [eax+ebx+8]
00402174    8958 04                        mov dword ptr [eax+4], ebx
00402177    884C10 08                      mov byte ptr [eax+edx+8], cl
0040217B    89F9                           mov ecx, edi
0040217D    8B7C24 20                      mov edi, dword ptr [esp+20]
00402181    884C18 08                      mov byte ptr [eax+ebx+8], cl
00402185    0FB60C24                       movzx ecx, byte ptr [esp]
00402189    024C10 08                      add cl, byte ptr [eax+edx+8]
0040218D    0FB65435 00                    movzx edx, byte ptr [ebp+esi]
00402192    0FB6C9                         movzx ecx, cl
00402195    325408 08                      xor dl, byte ptr [eax+ecx+8]
00402199    881437                         mov byte ptr [edi+esi], dl
0040219C    83C6 01                        add esi, 1
0040219F    3B7424 24                      cmp esi, dword ptr [esp+24]
004021A3  ^ 75 AB                          jnz short 00402150
004021A5    83C4 04                        add esp, 4
004021A8    5B                             pop ebx
004021A9    5E                             pop esi
004021AA    5F                             pop edi
004021AB    5D                             pop ebp
004021AC    C3                             retn
(all the "tiny rc4"'s seem to combine the two stages)
Last edited by Keya on Wed Jul 13, 2016 2:10 pm, edited 2 times in total.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: RC4 lib

Post by wilbert »

Thanks for posting the generated asm and the link to the very small x86 code. It's interesting to look at.
The very small x86 code probably won't be very fast. It's optimized to be small but some instructions it uses are very slow.
The output from the C compiler looks very well put together (as expected) although you can probably still beat it with hand coded asm.
The C version also has the advantage of the DropN function argument which seems to be important for better security.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: RC4 lib

Post by Keya »

wilbert wrote:The C version also has the advantage of the DropN function argument which seems to be important for better security.
Ooooh, i was wondering what that DropN was for (i thought it was just for discarding previous parts of the stream already used or something), but seeing what you said made me think about this:
http://security.stackexchange.com/quest ... ing-in-tls
One way to "fix" RC4, which has been suggested many times, is to drop the first 256 (or 512 or 1536 or whatever) bytes of output, since these are the most biased of them (the graphics in the slides show that quite clearly). But this would not be compatible with RC4-as-we-know-it
I just found this public domain C implementation of RC4's successor Spritz https://github.com/jedisct1/spritz
again just a C and H file so im now going to try and see if i can compile those as libs also :)
[edit] Success! yes it was just as easy as RC4. Spritz libs for PB
Last edited by Keya on Wed Jul 13, 2016 9:58 am, edited 2 times in total.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: RC4 lib

Post by wilbert »

Keya wrote:Ooooh, i was wondering what that DropN was for (i thought it was just for discarding previous parts of the stream already used or something), but seeing what you said made me think about this:
http://security.stackexchange.com/quest ... ing-in-tls
One way to "fix" RC4, which has been suggested many times, is to drop the first 256 (or 512 or 1536 or whatever) bytes of output, since these are the most biased of them (the graphics in the slides show that quite clearly). But this would not be compatible with RC4-as-we-know-it
Yes, that's exactly it :)
On the wikipedia page it is mentioned the first bytes are leaking information about the used key and that this can be avoided by dropping a number of bytes in the beginning.
The advantage of using a separate initialize and output procedure is that you can process the buffer in multiple steps which is very convenient for large files.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: RC4 lib

Post by Keya »

just some more concrete info about that for my own personal curiosity! https://en.wikipedia.org/wiki/Spritz_(cipher)#RC4_variants
As mentioned above, the most important weakness of RC4 comes from the insufficient key schedule; the first bytes of output reveal information about the key. This can be corrected by simply discarding some initial portion of the output stream.[55] This is known as RC4-dropN, where N is typically a multiple of 256, such as 768 or 1024.
[55] = http://eprint.iacr.org/2002/067 (full: http://eprint.iacr.org/2002/067.pdf) ...
Abstract: Most guidelines for implementation of the RC4 stream cipher recommend discarding the first 256 bytes of its output. This recommendation is based on the empirical fact that known attacks can either cryptanalyze RC4 starting at any point, or become harmless after these initial bytes are dumped. The motivation for this paper is to find a conservative estimate for the number of bytes that should be discarded in order to be safe. To this end we propose an idealized model of RC4 and analyze it applying the theory of random shuffles. Based on our analysis of the model we recommend dumping at least 512 bytes.
well it's so easy and fast to DropN i'm gonna go all out and drop 4096, lol
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: RC4 lib

Post by wilbert »

I just did a speed comparison.
It's surprising to see how much slower the O0 version is compared to the other versions.
Also the 64 bit O3 version is significantly faster as the 32 bit O3 version.
Maybe it uses the additional registers to store less information in memory. :?
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: RC4 lib

Post by Keya »

wilbert wrote:It's surprising to see how much slower the O0 version is compared to the other versions.
one thing that immediately stands out to me in the -O3 compile is the amount of eg. "[eax+edx+8]" triplets! (would love to add that to my asm game lol), whereas they dont really appear so much in the -O0 build which is also larger ...

Rc4Initialise: (gcc -O0) (328 bytes)

Code: Select all

00402000    55                   push ebp
00402001    89E5                 mov ebp, esp
00402003    83EC 10              sub esp, 10
00402006    C745 FC 00000000     mov dword ptr [ebp-4], 0
0040200D    EB 14                jmp short 00402023
0040200F    8B45 FC              mov eax, dword ptr [ebp-4]
00402012    89C1                 mov ecx, eax
00402014    8B55 08              mov edx, dword ptr [ebp+8]
00402017    8B45 FC              mov eax, dword ptr [ebp-4]
0040201A    01D0                 add eax, edx
0040201C    8848 08              mov byte ptr [eax+8], cl
0040201F    8345 FC 01           add dword ptr [ebp-4], 1
00402023    817D FC FF000000     cmp dword ptr [ebp-4], 0FF
0040202A  ^ 76 E3                jbe short 0040200F
0040202C    C745 F8 00000000     mov dword ptr [ebp-8], 0
00402033    C745 FC 00000000     mov dword ptr [ebp-4], 0
0040203A    EB 6E                jmp short 004020AA
0040203C    8B55 08              mov edx, dword ptr [ebp+8]
0040203F    8B45 FC              mov eax, dword ptr [ebp-4]
00402042    01D0                 add eax, edx
00402044    0FB640 08            movzx eax, byte ptr [eax+8]
00402048    0FB6D0               movzx edx, al
0040204B    8B45 F8              mov eax, dword ptr [ebp-8]
0040204E    8D0C02               lea ecx, dword ptr [edx+eax]
00402051    8B45 FC              mov eax, dword ptr [ebp-4]
00402054    BA 00000000          mov edx, 0
00402059    F775 10              div dword ptr [ebp+10]
0040205C    8B45 0C              mov eax, dword ptr [ebp+C]
0040205F    01D0                 add eax, edx
00402061    0FB600               movzx eax, byte ptr [eax]
00402064    0FB6C0               movzx eax, al
00402067    01C8                 add eax, ecx
00402069    25 FF000000          and eax, 0FF
0040206E    8945 F8              mov dword ptr [ebp-8], eax
00402071    8B55 08              mov edx, dword ptr [ebp+8]
00402074    8B45 FC              mov eax, dword ptr [ebp-4]
00402077    01D0                 add eax, edx
00402079    0FB640 08            movzx eax, byte ptr [eax+8]
0040207D    8845 F3              mov byte ptr [ebp-D], al
00402080    8B55 08              mov edx, dword ptr [ebp+8]
00402083    8B45 F8              mov eax, dword ptr [ebp-8]
00402086    01D0                 add eax, edx
00402088    0FB640 08            movzx eax, byte ptr [eax+8]
0040208C    8B4D 08              mov ecx, dword ptr [ebp+8]
0040208F    8B55 FC              mov edx, dword ptr [ebp-4]
00402092    01CA                 add edx, ecx
00402094    8842 08              mov byte ptr [edx+8], al
00402097    8B55 08              mov edx, dword ptr [ebp+8]
0040209A    8B45 F8              mov eax, dword ptr [ebp-8]
0040209D    01C2                 add edx, eax
0040209F    0FB645 F3            movzx eax, byte ptr [ebp-D]
004020A3    8842 08              mov byte ptr [edx+8], al
004020A6    8345 FC 01           add dword ptr [ebp-4], 1
004020AA    817D FC FF000000     cmp dword ptr [ebp-4], 0FF
004020B1  ^ 76 89                jbe short 0040203C
004020B3    C745 FC 00000000     mov dword ptr [ebp-4], 0
004020BA    C745 F8 00000000     mov dword ptr [ebp-8], 0
004020C1    C745 F4 00000000     mov dword ptr [ebp-C], 0
004020C8    EB 63                jmp short 0040212D
004020CA    8B45 FC              mov eax, dword ptr [ebp-4]
004020CD    83C0 01              add eax, 1
004020D0    25 FF000000          and eax, 0FF
004020D5    8945 FC              mov dword ptr [ebp-4], eax
004020D8    8B55 08              mov edx, dword ptr [ebp+8]
004020DB    8B45 FC              mov eax, dword ptr [ebp-4]
004020DE    01D0                 add eax, edx
004020E0    0FB640 08            movzx eax, byte ptr [eax+8]
004020E4    0FB6D0               movzx edx, al
004020E7    8B45 F8              mov eax, dword ptr [ebp-8]
004020EA    01D0                 add eax, edx
004020EC    25 FF000000          and eax, 0FF
004020F1    8945 F8              mov dword ptr [ebp-8], eax
004020F4    8B55 08              mov edx, dword ptr [ebp+8]
004020F7    8B45 FC              mov eax, dword ptr [ebp-4]
004020FA    01D0                 add eax, edx
004020FC    0FB640 08            movzx eax, byte ptr [eax+8]
00402100    8845 F2              mov byte ptr [ebp-E], al
00402103    8B55 08              mov edx, dword ptr [ebp+8]
00402106    8B45 F8              mov eax, dword ptr [ebp-8]
00402109    01D0                 add eax, edx
0040210B    0FB640 08            movzx eax, byte ptr [eax+8]
0040210F    8B4D 08              mov ecx, dword ptr [ebp+8]
00402112    8B55 FC              mov edx, dword ptr [ebp-4]
00402115    01CA                 add edx, ecx
00402117    8842 08              mov byte ptr [edx+8], al
0040211A    8B55 08              mov edx, dword ptr [ebp+8]
0040211D    8B45 F8              mov eax, dword ptr [ebp-8]
00402120    01C2                 add edx, eax
00402122    0FB645 F2            movzx eax, byte ptr [ebp-E]
00402126    8842 08              mov byte ptr [edx+8], al
00402129    8345 F4 01           add dword ptr [ebp-C], 1
0040212D    8B45 F4              mov eax, dword ptr [ebp-C]
00402130    3B45 14              cmp eax, dword ptr [ebp+14]
00402133  ^ 72 95                jb short 004020CA
00402135    8B45 08              mov eax, dword ptr [ebp+8]
00402138    8B55 FC              mov edx, dword ptr [ebp-4]
0040213B    8910                 mov dword ptr [eax], edx
0040213D    8B45 08              mov eax, dword ptr [ebp+8]
00402140    8B55 F8              mov edx, dword ptr [ebp-8]
00402143    8950 04              mov dword ptr [eax+4], edx
00402146    C9                   leave
00402147    C3                   retn
Rc4Xor (gcc -O0) (221 bytes)

Code: Select all

00402216    55                  push ebp
00402217    89E5                mov ebp, esp
00402219    56                  push esi
0040221A    53                  push ebx
0040221B    83EC 10             sub esp, 10
0040221E    C745 F4 00000000    mov dword ptr [ebp-C], 0
00402225    E9 B6000000         jmp 004022E0
0040222A    8B45 08             mov eax, dword ptr [ebp+8]
0040222D    8B00                mov eax, dword ptr [eax]
0040222F    83C0 01             add eax, 1
00402232    0FB6D0              movzx edx, al
00402235    8B45 08             mov eax, dword ptr [ebp+8]
00402238    8910                mov dword ptr [eax], edx
0040223A    8B45 08             mov eax, dword ptr [ebp+8]
0040223D    8B48 04             mov ecx, dword ptr [eax+4]
00402240    8B45 08             mov eax, dword ptr [ebp+8]
00402243    8B00                mov eax, dword ptr [eax]
00402245    8B55 08             mov edx, dword ptr [ebp+8]
00402248    0FB64402 08         movzx eax, byte ptr [edx+eax+8]
0040224D    0FB6C0              movzx eax, al
00402250    01C8                add eax, ecx
00402252    0FB6D0              movzx edx, al
00402255    8B45 08             mov eax, dword ptr [ebp+8]
00402258    8950 04             mov dword ptr [eax+4], edx
0040225B    8B45 08             mov eax, dword ptr [ebp+8]
0040225E    8B00                mov eax, dword ptr [eax]
00402260    8B55 08             mov edx, dword ptr [ebp+8]
00402263    0FB64402 08         movzx eax, byte ptr [edx+eax+8]
00402268    8845 F3             mov byte ptr [ebp-D], al
0040226B    8B45 08             mov eax, dword ptr [ebp+8]
0040226E    8B00                mov eax, dword ptr [eax]
00402270    8B55 08             mov edx, dword ptr [ebp+8]
00402273    8B52 04             mov edx, dword ptr [edx+4]
00402276    8B4D 08             mov ecx, dword ptr [ebp+8]
00402279    0FB64C11 08         movzx ecx, byte ptr [ecx+edx+8]
0040227E    8B55 08             mov edx, dword ptr [ebp+8]
00402281    884C02 08           mov byte ptr [edx+eax+8], cl
00402285    8B45 08             mov eax, dword ptr [ebp+8]
00402288    8B40 04             mov eax, dword ptr [eax+4]
0040228B    8B55 08             mov edx, dword ptr [ebp+8]
0040228E    0FB64D F3           movzx ecx, byte ptr [ebp-D]
00402292    884C02 08           mov byte ptr [edx+eax+8], cl
00402296    8B55 10             mov edx, dword ptr [ebp+10]
00402299    8B45 F4             mov eax, dword ptr [ebp-C]
0040229C    8D0C02              lea ecx, dword ptr [edx+eax]
0040229F    8B55 0C             mov edx, dword ptr [ebp+C]
004022A2    8B45 F4             mov eax, dword ptr [ebp-C]
004022A5    01D0                add eax, edx
004022A7    0FB618              movzx ebx, byte ptr [eax]
004022AA    8B45 08             mov eax, dword ptr [ebp+8]
004022AD    8B00                mov eax, dword ptr [eax]
004022AF    8B55 08             mov edx, dword ptr [ebp+8]
004022B2    0FB64402 08         movzx eax, byte ptr [edx+eax+8]
004022B7    0FB6F0              movzx esi, al
004022BA    8B45 08             mov eax, dword ptr [ebp+8]
004022BD    8B40 04             mov eax, dword ptr [eax+4]
004022C0    8B55 08             mov edx, dword ptr [ebp+8]
004022C3    0FB64402 08         movzx eax, byte ptr [edx+eax+8]
004022C8    0FB6C0              movzx eax, al
004022CB    01F0                add eax, esi
004022CD    0FB6C0              movzx eax, al
004022D0    8B55 08             mov edx, dword ptr [ebp+8]
004022D3    0FB64402 08         movzx eax, byte ptr [edx+eax+8]
004022D8    31D8                xor eax, ebx
004022DA    8801                mov byte ptr [ecx], al
004022DC    8345 F4 01          add dword ptr [ebp-C], 1
004022E0    8B45 F4             mov eax, dword ptr [ebp-C]
004022E3    3B45 14             cmp eax, dword ptr [ebp+14]
004022E6  ^ 0F82 3EFFFFFF       jb 0040222A
004022EC    83C4 10             add esp, 10
004022EF    5B                  pop ebx
004022F0    5E                  pop esi
004022F1    5D                  pop ebp
004022F2    C3                  retn
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: RC4 lib

Post by wilbert »

Looks like there's more memory usage also for the non optimized version.

Here's a hand coded version of the RC4Xor procedure.
I used the short x86 version as starting point.

Code: Select all

CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
  Macro rax : eax : EndMacro
  Macro rdx : edx : EndMacro
  Macro rbx : ebx : EndMacro 
  Macro rsp : esp : EndMacro
  Macro rbp : ebp : EndMacro
  Macro rsi : esi : EndMacro
  Macro rdi : edi : EndMacro
CompilerEndIf

Procedure RC4Xor_ASM(*Context.Rc4Context, *InBuffer, *OutBuffer, BufSize.l)
  DisableDebugger
  EnableASM
  
  ; check for BufSize = 0
  mov ecx, BufSize
  cmp ecx, 0
  !je rc4xor_end
  
  ; backup volatile registers
  mov [rsp -  8], rbx
  mov [rsp - 16], rbp
  mov [rsp - 24], rsi
  mov [rsp - 32], rdi
  
  ; load procedure arguments
  mov rsi, *Context
  mov rdi, *InBuffer
  mov rbp, *OutBuffer
  
  ; get context variables i and j
  movzx eax, byte [rsi]
  movzx ebx, byte [rsi + 4]
  add rsi, 8
  
  ; main xor loop
  !rc4xor_loop:
  add eax, 1                    ; i + 1
  movzx eax, al                 ; i % 256
  movzx edx, byte [rsi + rax]   ; tmp1 = S[i]
  add ebx, edx                  ; j + tmp1
  movzx ebx, bl                 ; j % 256
  mov dh, [rsi + rbx]           ; tmp2 = S[j]
  mov [rsi + rbx], dl           ; S[j] = tmp1
  mov [rsi + rax], dh           ; S[i] = tmp2
  add dl, dh                    ; tmp1 + tmp2
  movzx edx, dl                 ; tmp1 % 256
  movzx edx, byte [rsi + rdx]   ; tmp1 = S[tmp1]
  XOr dl, [rdi]                 ; tmp1 ! PeekB(*InBuffer)
  mov [rbp], dl                 ; PokeB(*OutBuffer, tmp1)
  add rdi, 1                    ; *InBuffer + 1
  add rbp, 1                    ; *OutBuffer + 1
  sub ecx, 1                    ; BufSize - 1
  !jnz rc4xor_loop              ; loop if BufSize > 0
  
  ; update context variables i and j
  mov [rsi - 8], eax
  mov [rsi - 4], ebx
  
  ; restore volatile registers
  mov rbx, [rsp -  8]
  mov rbp, [rsp - 16]
  mov rsi, [rsp - 24]
  mov rdi, [rsp - 32] 
  
  ; exit
  !rc4xor_end:
  
  DisableASM
  EnableDebugger
EndProcedure
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: RC4 lib

Post by Keya »

haha somebody couldnt resist an asm optimization challenge!!! brilliant work btw, i didnt come across a single x64 tiny rc4 in my searches, yours is the first ive seen :)
Ok it's only fair (or extremely unfair and rather uncalled for?) to put your code (123 bytes x86 for those playing along at home) up against GCC's optimized compilations for a speed test! my x86 timings for 1x50mb (Rc4Xor function only):

Code: Select all

6th. gcc -O0: 382, 383, 384ms
5th. gcc -O1: 147, 148, 147ms \
4rd. gcc -O2: 147, 148, 147ms | ~tie
3nd. gcc -O3: 146, 147, 146ms /
2nd. gcc -Os: 139, 140, 139ms
1st. wilbert:  97,  96,  97ms    *(has since withdrawn from Rio due to Zika concerns)
WELL ISN'T THAT A SURPRISE, lol :)

Code: Select all

movzx ebx, bl        ; j % 256
i love it! you and i both know i would've div'd that to get the mod, lol
Last edited by Keya on Fri Jul 15, 2016 8:33 am, edited 1 time in total.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: RC4 lib

Post by wilbert »

Keya wrote:haha somebody couldnt resist an asm optimization challenge!!!
You're absolutely right; couldn't resist the challenge :oops: :D

Here's an attempt at a complete recreation of the lib with all three procedures wrapped in a module.

Code: Select all

; RC4 module by Wilbert

; Supported platforms : All (Cross platform)

; Last change         : Jul 14, 2016


;- *** Module declaration ***

DeclareModule RC4
  
  Structure RC4Context
    i.l
    j.l
    S.a[256]
  EndStructure
  
  Declare Rc4Initialise(*Context.RC4Context, *Key, KeySize.l, DropN.l = 0)
  Declare Rc4Output(*Context.Rc4Context, *Buffer, BufSize.l)
  Declare Rc4Xor(*Context.Rc4Context, *InBuffer, *OutBuffer, BufSize.l)
  
EndDeclareModule

;- *** Module implementation ***

Module RC4
  
  DisableDebugger
  EnableExplicit
  EnableASM
  
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    Macro rax : eax : EndMacro
    Macro rcx : ecx : EndMacro 
    Macro rdx : edx : EndMacro
    Macro rbx : ebx : EndMacro 
    Macro rsp : esp : EndMacro
    Macro rbp : ebp : EndMacro
    Macro rsi : esi : EndMacro
    Macro rdi : edi : EndMacro
  CompilerEndIf
  
  Procedure Rc4Initialise(*Context.RC4Context, *Key, KeySize.l, DropN.l = 0)
    
    ; backup volatile registers
    mov [rsp -  8], rbx
    mov [rsp - 24], rsi
    mov [rsp - 32], rdi
    
    ; load procedure arguments
    mov rsi, *Context
    mov rdi, *Key
    
    ; setup key schedule
    add rsi, 8
    XOr ecx, ecx
    !rc4init_loop0:
    mov [rsi + rcx], cl           ; S[i] = i
    add cl, 1
    !jnz rc4init_loop0
    XOr eax, eax
    XOr ebx, ebx
    !rc4init_loop1:
    movzx edx, byte [rsi + rax]   ; tmp1 = S[i]
    add ebx, edx                  ; j + tmp1
    add bl, [rdi + rcx]           ; j + Key[i % KeySize]
    add ecx, 1
    cmp ecx, KeySize
    !jne rc4init_cont0
    XOr ecx, ecx
    !rc4init_cont0:
    movzx ebx, bl                 ; j % 256
    mov dh, [rsi + rbx]           ; tmp2 = S[j]  
    mov [rsi + rbx], dl           ; S[j] = tmp1
    mov [rsi + rax], dh           ; S[i] = tmp2
    add al, 1
    !jnz rc4init_loop1
    XOr ebx, ebx
    
    ; drop first bytes (if requested)
    mov ecx, DropN
    test ecx, ecx
    !jz rc4init_cont1
    !rc4init_loop2:
    add eax, 1                    ; i + 1
    movzx eax, al                 ; i % 256
    movzx edx, byte [rsi + rax]   ; tmp1 = S[i]
    add ebx, edx                  ; j + tmp1
    movzx ebx, bl                 ; j % 256
    mov dh, [rsi + rbx]           ; tmp2 = S[j]
    mov [rsi + rbx], dl           ; S[j] = tmp1
    mov [rsi + rax], dh           ; S[i] = tmp2  
    sub ecx, 1
    !jnz rc4init_loop2
    !rc4init_cont1:
    
    ; update context variables i and j
    mov [rsi - 8], eax
    mov [rsi - 4], ebx
    
    ; restore volatile registers
    mov rbx, [rsp -  8]
    mov rsi, [rsp - 24]
    mov rdi, [rsp - 32]   

  EndProcedure
  
  Macro M_Rc4(n)
    
    ; check for BufSize = 0
    mov ecx, BufSize
    test ecx, ecx
    !jz rc4xor#n#_end
    
    ; backup volatile registers
    mov [rsp -  8], rbx
    mov [rsp - 16], rbp
    mov [rsp - 24], rsi
    CompilerIf n
      mov [rsp - 32], rdi
    CompilerEndIf
    
    ; load procedure arguments
    mov rsi, *Context
    CompilerIf n
      mov rdi, *InBuffer
      mov rbp, *OutBuffer
    CompilerElse
      mov rbp, *Buffer
    CompilerEndIf
  
    ; get context variables i and j
    movzx eax, byte [rsi]
    movzx ebx, byte [rsi + 4]
    add rsi, 8
    
    ; main loop
    !rc4xor#n#_loop:
    add eax, 1                    ; i + 1
    movzx eax, al                 ; i % 256
    movzx edx, byte [rsi + rax]   ; tmp1 = S[i]
    add ebx, edx                  ; j + tmp1
    movzx ebx, bl                 ; j % 256
    mov dh, [rsi + rbx]           ; tmp2 = S[j]
    mov [rsi + rbx], dl           ; S[j] = tmp1
    mov [rsi + rax], dh           ; S[i] = tmp2
    add dl, dh                    ; tmp1 + tmp2
    movzx edx, dl                 ; tmp1 % 256
    movzx edx, byte [rsi + rdx]   ; tmp1 = S[tmp1]
    CompilerIf n
      XOr dl, [rdi]               ; tmp1 ! PeekB(*InBuffer)
    CompilerEndIf    
    mov [rbp], dl                 ; PokeB(*OutBuffer, tmp1)
    CompilerIf n
      add rdi, 1                  ; *InBuffer + 1
    CompilerEndIf 
    add rbp, 1                    ; *OutBuffer + 1
    sub ecx, 1                    ; BufSize - 1
    !jnz rc4xor#n#_loop           ; loop if BufSize > 0
    
    ; update context variables i and j
    mov [rsi - 8], eax
    mov [rsi - 4], ebx
    
    ; restore volatile registers
    mov rbx, [rsp -  8]
    mov rbp, [rsp - 16]
    mov rsi, [rsp - 24]
    CompilerIf n
      mov rdi, [rsp - 32]
    CompilerEndIf    
    
    ; exit
    !rc4xor#n#_end:
    
  EndMacro
    
  Procedure Rc4Output(*Context.Rc4Context, *Buffer, BufSize.l)
    M_Rc4(0)
  EndProcedure
  
  Procedure Rc4Xor(*Context.Rc4Context, *InBuffer, *OutBuffer, BufSize.l)
    M_Rc4(1)
  EndProcedure
  
EndModule
Windows (x64)
Raspberry Pi OS (Arm64)
Post Reply