example of trivial text encoding/decoding for ascii strings

Share your advanced PureBasic knowledge/code with the community.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: example of trivial text encoding/decoding for ascii stri

Post by wilbert »

Keya wrote:aw, i thought i was already accounting for delta=0, my bad! i try again
The problem is that delta can be both positive and negative.
When delta = -1 and you add 1, you end up with 0.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: example of trivial text encoding/decoding for ascii stri

Post by Keya »

disappointing! i had one of those brain moments, lol :( for some reason i kept thinking of the difference as always being 1-254. Now i'm not sure if it's possible without using all 255 chars, breaking string-friendliness. How embarrassing :oops:
Bo Marchais
User
User
Posts: 61
Joined: Sun Apr 03, 2016 12:03 am

Re: example of trivial text encoding/decoding for ascii stri

Post by Bo Marchais »

Keya... delta encoding (diff) can still be useful if you restrict the character set/range.
You can't have both 8 bit clean encoding AND make exceptions for the null character...with one exception.
This is different than a ROT cipher (which should be called modulo) because instead of adding an offset, we'll just shift around the bits.
You can bit shift (also called rotate) within the bytes however you like without breaking null termination of a string.
The best news?
When all the bits are zero, it doesn't matter how you move them around. It's another flavor of simple stream cipher.

It's not as brief, but the simplest implementation would be to rotate bits in a fixed way, or better yet, according to character count.
character 1 gets shifted 1 bit (12345678--> 23456781), the ext one two bits (12345678-->34567812), and so forth. The next simplest is to rotate left for zero/even chars, and right for odd chars. You can do more.

You wouldn't bother doing more than 8 or 16 rotates, of course, so you'd want to mod the position in the same way that we did in the first cipher to get a mod 256 of the position. And really, maybe you don't want to stick to simple rotate 1 position per character in the string...that's kid stuff. You can do the rotate option based on the previous char for a simple variation.

Rotating bits is better than simple ciphers, and scrambling bits is even better - but be aware that this simple cipher leaks information that can be used for analysis, but it's probably beyond your average notepad++ using client to look into that.
User avatar
nco2k
Addict
Addict
Posts: 1344
Joined: Mon Sep 15, 2003 5:55 am

Re: example of trivial text encoding/decoding for ascii stri

Post by nco2k »

you guys are way overthinking this. none of the presented methods are secure. if you are targeting script kiddies, you should pick something fast and simple. if you are targeting elite crackers, you should go all the way and use a rehashed and salted aes loop and establish a secure https connection, to receive the key and vector. dont forget to lock and encrypt the memory area of your input/output buffer as well. also write another app, that monitors your main apps integrity and reports back every x milliseconds. :D in my humble opinion, you really need to ask yourselves, what kind of app are you creating and if the extra work is actually worth the effort. in the end, you will waste a lot of time and only create the illusion of something secure.

a simple nibble swap is enough to render any string unreadable:

Code: Select all

Structure Str
  StructureUnion
    c.c
    b.b[0]
  EndStructureUnion
EndStructure

Procedure SNIB(*Str.Str)
  Protected i
  While *Str\c
    If *Str\c <> $0A And *Str\c <> $A0 And *Str\c <> $0D And *Str\c <> $D0
      For i = 0 To SizeOf(Character) - 1
        *Str\b[i] = (*Str\b[i] >> 4 & $0F) | (*Str\b[i] << 4 & $F0)
      Next
    EndIf
    *Str + SizeOf(Character)
  Wend
EndProcedure

Text$ = "Ћирилица"
Debug Text$

SNIB(@Text$)
Debug Text$

SNIB(@Text$)
Debug Text$
compile in unicode mode, for the "Ћирилица" example string to work.

i left out $0D (CR), $0A (LF) and its counterparts, for the returned string to be compatible with ReadString(), which (by default) stops reading after it hits CR/LF.

c ya,
nco2k
If OSVersion() = #PB_OS_Windows_ME : End : EndIf
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: example of trivial text encoding/decoding for ascii stri

Post by wilbert »

My attempt at some sort of rotation based on the remark of Bo Marchais.
Seems to work quite well and fast :)

Code: Select all

Procedure.c RolChar(c.c, n)
  !movzx ecx, byte [p.v_n]
  CompilerIf #PB_Compiler_Unicode
    !movzx eax, word [p.v_c]
    !rol ax, cl
  CompilerElse
    !movzx eax, byte [p.v_c]
    !rol al, cl
  CompilerEndIf
  ProcedureReturn  
EndProcedure

Procedure Encode(*String.Character, i = 0)
  Protected.c c
  While *String\c
    i + 1 : c = -RolChar(*String\c, i + c)
    Swap *String\c, c : *String + SizeOf(Character)
  Wend
EndProcedure

Procedure Decode(*String.Character, i = 0)
  Protected.c c
  While *String\c
    i + 1 : c = RolChar(-*String\c, -(i + c))
    *String\c = c : *String + SizeOf(Character)
  Wend
EndProcedure


S.s = "This is a test string 123"
Encode(@S, 3)
Debug S

Decode(@S, 3)
Debug S
Just for fun, the code above translated to ASM (unicode only)

Code: Select all

CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
  Macro rbx : ebx : EndMacro
  Macro rdx : edx : EndMacro
  Macro rsi : esi : EndMacro  
CompilerEndIf 

Procedure EncodeU(*String, i = 0)
  EnableASM
  mov rdx, *String
  mov eax, [p.v_i]
  push rbx
  push rsi
  sub esi, esi
  !encode_u:
  add eax, 1
  lea ecx, [esi + eax]
  movzx ebx, word [rdx]
  add rdx, 2
  mov esi, ebx
  rol bx, cl
  neg bx
  mov [rdx - 2], bx
  !jnz encode_u
  pop rsi
  pop rbx
  DisableASM
EndProcedure

Procedure DecodeU(*String, i = 0)
  EnableASM
  mov rdx, *String
  mov eax, [p.v_i]
  push rbx
  sub ebx, ebx
  !decode_u:
  add eax, 1
  lea ecx, [ebx + eax]
  movzx ebx, word [rdx]
  add rdx, 2
  neg bx
  ror bx, cl
  mov [rdx - 2], bx
  !jnz decode_u
  pop rbx
  DisableASM
EndProcedure


S.s = "This is a test string 123"
EncodeU(@S, 3)
Debug S

DecodeU(@S, 3)
Debug S
Edit:
Replaced the code with a simpler procedure
Windows (x64)
Raspberry Pi OS (Arm64)
Bo Marchais
User
User
Posts: 61
Joined: Sun Apr 03, 2016 12:03 am

Re: example of trivial text encoding/decoding for ascii stri

Post by Bo Marchais »

@nco2k
you guys are way overthinking this.
And yet I notice you still throw nibble swapping into the hat...another example of bit shifting. :)
It's fun to putter about with this stuff, and the computational cost is very low compared to AES.
The bad news... even after you take all the steps mentioned (and all valid ideas, by the way!)
you're still out of luck. :(

https is hopelessly insecure, AES has lots of design flaws (not to mention interesting side channel attacks and data leaks due to implementation), the OS provides many huge gaping security holes (but only slightly more than the hardware it runs on) and worst of all: almost all crypto methods are using purposely (or stupidly) reduced key spaces.

In other words, it's basically turtles all the way down. :)
Still, fun is fun.

@Wilbert
Very nice. One of the principal weaknesses of this method (and any simple xor based method) is that a string of repeating spaces or even a common/known phrase will render a pattern that can be tested for. And you would think that this requires a computer or automated tools, but it doesn't always.

One really useful test is to dump some encoded data and look at it. You'd be amazed at how you can often spot a pattern in ciphered text just by looking at it. Often, that pattern reveals either how it was ciphered, or provides information about the flavor of the cipher. The human brain is pretty sharp... and anything it gets exposed to more than once will start developing tools to classify and work with it - even if you don't mean to. And these skills are often very sophisticated even if we don't put any effort in.

Here's an example of information leakage from the simple stream cipher in my first post:
D3E3F000102030405060708090A0B0C0D0E0F101112
Notice the pattern? Those are spaces - they reveal that the xor key is an incrementing integer.
Rotating the bits and nibble swaps have similar "tells" - information that makes it easier for us to write software to try and crack it, or in many simple cases might give us the ability to solve the problem in our head or by using pencil and paper.

The bit rotating schemes (like all simple schemes) leak information - if you count how many ones and zeros are in each character, regardless of scrambling, you can often tell a) what language it is and b) more easily guess what character it is, or at least narrow it down. This kind of analysis is pretty simple, but perhaps above your average script kiddy cracker.

Now, if you xor the number you use to figure out how to rotate your bits AND you xor your data against some combination of these, you will make it even harder to casually spot the cipher - at a cost of only a few more instructions. Someone wanting to dig into your data will have a harder time yet (if they don't reverse engineer your code) and it will take longer.

For software use, your encryption just has to be resistant to the point that amateurs will give up after a couple hours...or maybe days. When the pain of decoding exceeds the desire/skills of the casual programmer, it's close enough. If you use someone's AES package (as suggested by nco2k, and quite solid advice!) all they have to do is fire-up a debugger, locate the entry point to the AES routine and pull out the key - that's how many DVD and videogame keys got found.

But if you roll your own, spread it out across the program and distribute the decoding algorithm in unexpected places, or perhaps use an intermediate decoding step (leaving partially ciphered text in memory and then decode it as needed...well, even fairly skilled crackers are going to have to invest a little more time to reverse engineer your nonstandard approach.

Your simple bit rotate (with an xor somewhere in there, and maybe a nibble swap like nco2k showed), will be more than enough for most of us. I hate to admit that I don't even know if there's a nibble swap instruction in x86. Also, I suppose escaping for 00/0A/0D/etc. as shown by nco2k would be fine to avoid overhead - I like that a lot and I think it's an acceptable tradeoff to leak a little info if we can decrease the size of the encoded data.

If someone combines all of this, we'll have a very useful routine to drop into our code. If of course, some bright person will write it for us...
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: example of trivial text encoding/decoding for ascii stri

Post by wilbert »

Bo Marchais wrote:Very nice. One of the principal weaknesses of this method (and any simple xor based method) is that a string of repeating spaces or even a common/known phrase will render a pattern that can be tested for.
You are right that it shows a pattern.
The code below (incompatible with the previous procedure I posted) will make the pattern less obvious but you might still be able to tell.

Code: Select all

CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
  Macro rbx : ebx : EndMacro
  Macro rdx : edx : EndMacro
  Macro rsi : esi : EndMacro  
CompilerEndIf 

Procedure EncodeU(*String, i = 0)
  EnableASM
  mov rdx, *String
  mov ecx, [p.v_i]
  push rbx
  push rsi
  sub esi, esi
  !mov eax, 0xace1
  !ror ax, cl
  !encode_u:
  !shr eax, 1
  !jnc encode_u0
  !xor eax, 0x80200003
  !encode_u0:
  lea ecx, [esi + eax]
  movzx ebx, word [rdx]
  add rdx, 2
  mov esi, ebx
  rol bx, cl
  neg bx
  mov [rdx - 2], bx
  !jnz encode_u
  pop rsi
  pop rbx
  DisableASM
EndProcedure

Procedure DecodeU(*String, i = 0)
  EnableASM
  mov rdx, *String
  mov ecx, [p.v_i]
  push rbx
  sub ebx, ebx
  !mov eax, 0xace1
  !ror ax, cl
  !decode_u:
  !shr eax, 1
  !jnc decode_u0
  !xor eax, 0x80200003
  !decode_u0:
  lea ecx, [ebx + eax]
  movzx ebx, word [rdx]
  add rdx, 2
  neg bx
  ror bx, cl
  mov [rdx - 2], bx
  !jnz decode_u
  pop rbx
  DisableASM
EndProcedure


S.s = Space(32)
EncodeU(@S, 3)
ShowMemoryViewer(@S, StringByteLength(S))
You just haven't got many possible rotations for a character (16).
Xor works well for binary data. The challenge is to produce encoded data without null terminating characters and that seems difficult to me with xor.
Then again, I don't know much about the subject so I might be overlooking something :)
Windows (x64)
Raspberry Pi OS (Arm64)
walbus
Addict
Addict
Posts: 929
Joined: Sat Mar 02, 2013 9:17 am

Re: example of trivial text encoding/decoding for ascii stri

Post by walbus »

Nice codes, but it is the self their try making gold, same the old alchemists.

This is a shot generated with the code above, for search pattern

Image
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: example of trivial text encoding/decoding for ascii stri

Post by wilbert »

walbus wrote:This is a shot generated with the code above, for search pattern

Image
There's obviously a lot of C3 there :?
Well, it was fun trying. :wink:
Windows (x64)
Raspberry Pi OS (Arm64)
walbus
Addict
Addict
Posts: 929
Joined: Sat Mar 02, 2013 9:17 am

Re: example of trivial text encoding/decoding for ascii stri

Post by walbus »

Yep wilbert, but you make great codes !

The same problems with pattern you can become with AES. :?
So is using AES not ever so simple it looking at first.
Last edited by walbus on Wed Apr 13, 2016 11:33 pm, edited 1 time in total.
Bo Marchais
User
User
Posts: 61
Joined: Sun Apr 03, 2016 12:03 am

Re: example of trivial text encoding/decoding for ascii stri

Post by Bo Marchais »

If you are going for 16 bits instead of 8, you have to change the routines to account for that.

An ideal encryption is almost indistinguishable from noise...although that in itself is a signature!
I can't take the time to describe so many kinds of signature, but there are many programs and systems - very big, very powerful and very fast systems that can identify and classify data based on signature tests. I have seen home grown versions that (to me) border on AI... you might say they are (puts on sunglasses) Turing Complete. These systems just happen to decode as well, so...

The work of decoding signals did not stop after Alan Turing took a bite from the apple, and is far more developed than any of us can possibly imagine. It's really just an interesting branch of mathematics. I'd be overjoyed to write something that took an amateur more than 5 days to break. Luckily for me, I got 99 problems but that ain't one...

Wilbert, if you take the first 8 bytes of the encoded message and then use that as a rotating pattern to xor against your data for every word afterwards, you'll fix that pesky C3 issue. By rotating pattern, i mean you can either bit shift OR rotate through the bytes, or even alternate a MOD add/subtract. The very moment that mapping the frequency of the pairs doesn't show any big peaks, you're probably in good shape.

Also, your salt could be an md5 checksum or text string and nobody says you can't use that to do xor with.
I won't write any more about ciphers here, but I hope people find it interesting to tinker with.
It's at least as interesting as sudoku.
Post Reply