When reading about padding in general, there were several ways to employ. So while reading on different pages, i found the best way to do this by padding on the last block n bytes with the value n, where n is how much bytes are needed to complete the block to 16 bytes. The thing is that one put in practice, it's not feasible because you might have this buffer:
B(16) = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1}
As you can see, all the values are "1" and the buffer length is multiple of 16, thus it doesn't need padding. BUT once the decryption checks for padding, it will read the last byte wich is 1 (in red). The rule says that we need to check backwards if "1" bytes of "1" exists. And it does! This will make the procedure strip the last byte, even if it wasn't needed, because the rule applied.
SO
While trying to understand how to make this work I found this:
(RFC2630) Each padding byte is the pad count (16 extra added if size is already a multiple of 16)
Basically in order to know if the buffer was padded, each padding byte is the pad count. And we did that. But 16 are extra added if size is already a multiple of 16 which we didn't. This was the mistake with the code...
Now, no matter what happens, we pad the buffer with strictly 1 to 16 bytes so any confusion is eliminated. We always pad, even if the buffer contains strictly 16 bytes.
That being said, I updated the code in the first post. Let me know if there are any problems.