RC4 lib

Share your advanced PureBasic knowledge/code with the community.
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: RC4 lib

Post by Keya »

well, the libs were useful for a couple days anyway, lol :) im just super happy it was the first time i compiled a lib for all 3 OS! [✓ bucketlist]. Im guessing the download links will be dead in 30 days anyway, lol
With the tiny size of RC4 yet genuine cryptographic quality (DropN'd of course!) and ease of being a stream cipher i really like how it can be used in place of that Xor/Rot13 family (hey it's only a couple bytes more!) for those times when Top Secret AES block cipher encryption is overkill yet still want something better than 8bit/32bit key "weak scramble cipher". A great middleground i think! brilliant work again wilbert! :)
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: RC4 lib

Post by wilbert »

Keya wrote:im just super happy it was the first time i compiled a lib for all 3 OS! [✓ bucketlist].
That is a great achievement :)
There's lots of C code available and compiled with optimization on, it's fast and easy to use.
It has also been helpful to see the difference for the different optimization settings; good to know not to compile with O0 :wink:
Keya wrote:With the tiny size of RC4 yet genuine cryptographic quality (DropN'd of course!) and ease of being a stream cipher i really like how it can be used in place of that Xor/Rot13 family (hey it's only a couple bytes more!)
It is a nice algorithm and indeed tiny !
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: RC4 lib

Post by Keya »

wilbert wrote:There's lots of C code available and compiled with optimization on, it's fast and easy to use.
yeah! talk about doors -> OPEN!!! im discovering there are so many amazing libs out there, and Fred was awesome enough to add Import functionality to PB, i think he's empowered us with such a great capability there but I get the feeling most PB'ers don't take advantage of it unless there's precompiled libs simply because C compilers are such a {you know that long evergrowing list of expletives! yeah that one!}, and I can only speak for myself here but the C language itself I find a bit intimidating and weird ... basic syntax just makes sense to me and I'm so happy Purebasic is really all I need. But the way I see it is this: we don't really have to learn to write any C anyway, we just have to (to use libs) learn how to compile an existing .c to .o, so surely the C programmer has already done 99% of the hard work for us?!? so I refuse to be scared by C, i don't feel I have to learn it, so yes I'm up for that final 1%, and I will conquer my fear of C compilers!!!
One Lib At A Time™ ...
wilbert wrote:It has also been helpful to see the difference for the different optimization settings; good to know not to compile with O0 :wink:
Yeah, also good to know GCC's highest level of optimization has got nothing on your asm game hey wilbert!?!? lol

btw I was just reading https://gcc.gnu.org/onlinedocs/gcc/Opti ... tions.html
and I just learned of the -Os optimize for SMALL SIZE option, which is basically a skinnier -O2.
Anyway that took its RC4 .o lib down from 920 to 875 bytes, and the resulting Rc4Xor is 95 bytes:

Code: Select all

004020DA    55              push ebp
004020DB    89E5            mov ebp, esp
004020DD    57              push edi
004020DE    56              push esi
004020DF    53              push ebx
004020E0    52              push edx
004020E1    31F6            xor esi, esi
004020E3    8B45 08         mov eax, dword ptr [ebp+8]
004020E6    3B75 14         cmp esi, dword ptr [ebp+14]
004020E9    74 48           je short 00402133
004020EB    8B18            mov ebx, dword ptr [eax]
004020ED    8D53 01         lea edx, dword ptr [ebx+1]
004020F0    0FB6D2          movzx edx, dl
004020F3    0FB67C10 08     movzx edi, byte ptr [eax+edx+8]
004020F8    8910            mov dword ptr [eax], edx
004020FA    89FB            mov ebx, edi
004020FC    897D F0         mov dword ptr [ebp-10], edi
004020FF    0378 04         add edi, dword ptr [eax+4]
00402102    89F9            mov ecx, edi
00402104    0FB6F9          movzx edi, cl
00402107    8A4C38 08       mov cl, byte ptr [eax+edi+8]
0040210B    8978 04         mov dword ptr [eax+4], edi
0040210E    884C10 08       mov byte ptr [eax+edx+8], cl
00402112    885C38 08       mov byte ptr [eax+edi+8], bl
00402116    8A4D F0         mov cl, byte ptr [ebp-10]
00402119    024C10 08       add cl, byte ptr [eax+edx+8]
0040211D    8B5D 0C         mov ebx, dword ptr [ebp+C]
00402120    0FB6C9          movzx ecx, cl
00402123    8A1433          mov dl, byte ptr [ebx+esi]
00402126    8B5D 10         mov ebx, dword ptr [ebp+10]
00402129    325408 08       xor dl, byte ptr [eax+ecx+8]
0040212D    881433          mov byte ptr [ebx+esi], dl
00402130    46              inc esi
00402131  ^ EB B3           jmp short 004020E6
00402133    58              pop eax
00402134    5B              pop ebx
00402135    5E              pop esi
00402136    5F              pop edi
00402137    5D              pop ebp
00402138    C3              retn
... and it just OUTPERFORMED all the other gcc -O's! Check out the updated timings on the previous page!
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: RC4 lib

Post by wilbert »

Keya wrote:... and it just OUTPERFORMED all the other gcc -O's! Check out the updated timings on the previous page!
That's great :)
Do you also have the size and asm code for the 64 bit version (-Os) ?
Would be interesting to compare it with the 32 bit version.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: RC4 lib

Post by Keya »

so x64 dropped the .o file from 1220 to 1195 bytes with gcc -Os, and Rc4Xor() is 85 bytes. Ooh the joy of more registers...

Code: Select all

00000001400020E8 | 57                       | push rdi
00000001400020E9 | 56                       | push rsi
00000001400020EA | 53                       | push rbx
00000001400020EB | 31 DB                    | xor ebx,ebx
00000001400020ED | 41 39 D9                 | cmp r9d,ebx
00000001400020F0 | 76 47                    | jbe rc4.140002139
00000001400020F2 | 8B 01                    | mov eax,dword ptr ds:[rcx]
00000001400020F4 | 44 8B 51 04              | mov r10d,dword ptr ds:[rcx+4]
00000001400020F8 | FF C0                    | inc eax
00000001400020FA | 0F B6 C0                 | movzx eax,al
00000001400020FD | 89 01                    | mov dword ptr ds:[rcx],eax
00000001400020FF | 44 0F B6 5C 01 08        | movzx r11d,byte ptr ds:[rcx+rax+8]
0000000140002105 | 45 01 DA                 | add r10d,r11d
0000000140002108 | 45 0F B6 D2              | movzx r10d,r10b
000000014000210C | 44 89 51 04              | mov dword ptr ds:[rcx+4],r10d
0000000140002110 | 42 8A 7C 11 08           | mov dil,byte ptr ds:[rcx+r10+8]
0000000140002115 | 40 88 7C 01 08           | mov byte ptr ds:[rcx+rax+8],dil
000000014000211A | 46 88 5C 11 08           | mov byte ptr ds:[rcx+r10+8],r11b
000000014000211F | 44 02 5C 01 08           | add r11b,byte ptr ds:[rcx+rax+8]
0000000140002124 | 45 0F B6 DB              | movzx r11d,r11b
0000000140002128 | 42 8A 44 19 08           | mov al,byte ptr ds:[rcx+r11+8]
000000014000212D | 32 04 1A                 | xor al,byte ptr ds:[rdx+rbx]
0000000140002130 | 41 88 04 18              | mov byte ptr ds:[r8+rbx],al
0000000140002134 | 48 FF C3                 | inc rbx
0000000140002137 | EB B4                    | jmp rc4.1400020ED
0000000140002139 | 5B                       | pop rbx
000000014000213A | 5E                       | pop rsi
000000014000213B | 5F                       | pop rdi
000000014000213C | C3                       | ret
The disassembler is x64dbg. I had to do a check to make sure "dil" was an x64 register, lol ... don't think i've seen it used before, but maybe people like me should be using dil instead of rax
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: RC4 lib

Post by wilbert »

Keya wrote:so x64 dropped the .o file from 1220 to 1195 bytes with gcc -Os, and Rc4Xor() is 85 bytes.
The disassembler is x64dbg. I had to do a check to make sure "dil" was an x64 register, lol ... don't think i've seen it used before, but maybe people like me should be using dil instead of rax
Thank you very much Keya :)
Nice to see the 64 bit code is even smaller compared to the 32 bit code and makes optimal use of the additional registers.
It probably also explains why the 64 bit code outperforms the 32 bit code.
I also had never heard of 'dil'. I understand now it's a name for the lowest 8 bits from the rdi register. :shock:
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: RC4 lib

Post by Keya »

wilbert wrote:I also had never heard of 'dil'. I understand now it's a name for the lowest 8 bits from the rdi register. :shock:
ahhh, that makes sense. So i take it there's also a DIH? please tell me there's no 16bit DIX
[update] well there's no DIH for those upper 8bits (only DIL for lowest), makes sense i guess, but just "DI" for the 16bit'er...
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: RC4 lib

Post by wilbert »

Keya wrote:
wilbert wrote:I also had never heard of 'dil'. I understand now it's a name for the lowest 8 bits from the rdi register. :shock:
ahhh, that makes sense. So i take it there's also a DIH? please tell me there's no 16bit DIX
[update] well there's no DIH for those upper 8bits (only DIL for lowest), makes sense i guess, but just "DI" for the 16bit'er...
Yes, I found this page which explains it
https://www.tortall.net/projects/yasm/m ... sters.html
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: RC4 lib

Post by Keya »

cl, dl, ah, bh, ch, dh, al, bl, cl, dl, dil, sil, bpl, spl, r8l, r9l, r10l, r11l, r12l, r13l, r14l, r15l, al, bl, cl, dl, sil, dil, bpl, spl, r8b, r9b, r10b, r11b, r12b, r13b, r14b, r15b, ax, dx, di, si, bp, sp, ax, bx, cx, dx, di, si, bp, sp, r8w, r9w, r10w, r11w, r12w, r13w, r14w, r15w, eax, ebx, ecx, edx, edi, esi, ebp, esp, r8d, r9d, r10d, r11d, r12d, r13d, r14d, r15d, rax, rbx, rcx, rdx, rdi, rsi, rbp, rsp, r8, r9, r10, r11, r12, r13, r14, r15, xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15, cs, ds, ss, es, fs, gs, eip, rip, mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7, st0, st1, st2, st3, st4, st5, st6, st7, fpr0, fpr1, fpr2, fpr3, fpr4, fpr5, fpr6, fpr7, xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, mxcsr, ymm0, ymm1, ymm2, ymm3, ymm4, ymm5, ymm6, ymm7, ymm8, ymm9, ymm10, ymm11, ymm12, ymm13, ymm14, ymm15, cr0, cr2, cr3, cr4, gdtr, ldtr, idtr, dr0, dr1, dr2, dr3, dr6, dr7, cr8, tpr, tr, gdt, ldt, idt, oh and tr, ok cool and I think i've memorized them all ....... hmm nope.

I KNOW - I'LL TURN THEM INTO A POEM.
Post Reply