Page 2 of 2

Posted: Mon May 22, 2006 3:09 am
by Dare2
@pupil.

I've pinched your bit set/get/clear code. :)

@dioxin and other gurus.

We tyros could probably do with some education on playing with the stack.

For example, I think PureBasic allows EAX,ECX,EDX to get clobbered but the other registers must be preserved. So if calling an imported proc, how do do this nicely? For example (fasm):

Code: Select all

format MS COFF

public fillWith

section '.text' code readable executable

; Copy 'pattern' of 'size' bytes to 'dest', 'count' times
; this destroys EAX, ECX, EDX (which shouldn't hurt PureBasic)

fillWith:              ; Dest, Pattern, Count, Size

  pop eax              ; return address

  pop edx              ; 'Size' to DX (think about pop cx; jcxz @exit; xchg cx,dx)
  pop ecx              ; 'Count' to CX (could jcxz @exit)
  pop esi              ;  Pattern
  pop edi              ;  Dest
  mov ebx,esi          ; So we can keep restoring it

  push eax             ; restore return addy

  cld                  ; go forward
@outerLoop:
  push ecx             ; Save 'Count'
  mov ecx, edx         ; Use 'Size'
  mov esi,ebx          ; point to 'Pattern' start
  rep movsb            ; copy 'Pattern' to next bit of 'Dest'
  pop ecx              ; 'Count' to counter
  loop @outerLoop      ; counter not zero, redo
@exit:
  ret
which creates an "obj" to be imported into PureBasic. (It doesn't work, btw, but hopefully serves as an example).

What would be a really good way to get the parameters, ensure the return address was in the right place, and also ensure the other registers were safely preserved?

Also if it was not imported but stuck in a procedure.


Edit:

This does work, but not sure how safe it is:

Code: Select all

format MS COFF

public fillWith

section '.text' code readable executable

; Copy 'pattern' of 'size' bytes to 'dest', 'count' times
; this destroys EAX, ECX, EDX (which shouldn't hurt PureBasic)
; however it also destroys EBX, ESI, EDI (not good)

fillWith:              ; Dest, Pattern, Count, Size

  push ebx             ; Save for pure
  push esi
  push edi

  mov edx,[esp + 28]   ; Dest
  mov ecx,[esp + 24]   ; Pattern
  mov esi,[esp + 20]   ;  Count
  mov edi,[esp + 16]   ;  Size

  mov ebx,esi          ; So we can keep restoring it

  cld                  ; go forward

@outerLoop:
  push ecx             ; Save 'Count'
  mov ecx, edx         ; Use 'Size'
  mov esi,ebx          ; point to 'Pattern' start
  rep movsb            ; copy 'Pattern' to next bit of 'Dest'
  pop ecx              ; 'Count' to counter
  loop @outerLoop      ; counter not zero, redo
@exit:
  pop edi              ; Restore for pure
  pop esi
  pop ebx
  ret
Used as:

Code: Select all

Import "path\to\myPatternFill.obj"
  fillWith.l(dest.l,patt.l,times.l,size.l) As "fillWith"
EndImport

d.s = "AbcdAbcdAbcdAbcdAbcdAbcd"
p.s = "1234"
fillWith(@d,@p,3,4)  ; half fill
Debug d

Posted: Mon May 22, 2006 9:03 pm
by dioxin
Usual way to use registers

There are 8 general purpose CPU registers available, eax,ebx,ecx,edx,esi,edi,ebp,esp.

Some of these have special uses but you can still use all of them as general purpose registers if you wish.
Some of the special uses (i.e. uses which the CPU hardware dictates can only be done with certain registers) include:
esp, the stack pointer
esi and edi as the source and destination index registers for the string instructions.
eax as the target of IN/OUT instruction
edx as the target of some forms of IMUL instruction
ecx as the counter in a LOOP operation


When in doubt you can save and retrieve all of the general purpose registers in one go using the PUSHA and POPA opcodes which push and pop all registers at the same time.


A different concern is the conventions for using registers.
These are often arbitrary and defined by the OS rather than the hardware but because you're programming under Windows it makes sense to follow the same conventions.
The conventions include:

eax,ecx,edx are scratch registers.
Use these as you wish. You don't need to preserve them for any one else, but neither does other code preserve them for you so they maintain their contents only while in your code. As soon as you make any call to any code which you didn't hand write in ASM yourself then you should assume that the contents of these registers are lost.

esi,edi,ebx must be preserved
If you use them in your code then save their contents and restore them before you exit your code. Other code will preserve them for you too so if you make calls to other routines that you didn't write then you can expect these 3 registers to be returned intact.



esp is the stack pointer
Since most code finds a stack useful it's usually best to not use this register for anything else than the stack although, with care, you can treat it as any other register if you wish.

ebp is conventionally used as the base pointer to the stack frame of the current local variables.
Local variables can be addressed using ebp but if you don't need to use local variables in your procedures then you can use this as a general purpose register.




The Stack.
Main rule, if you push things onto the stack then you must pop the same number of things off it.
The last thing pushed is the next thing popped.

One use of the stack is to enable local variable to be set up in each procedure which don't interfere with variables from other procedures or from previous instances the same procedure (in the case of recursion).

Let's have a simple example, all code here is pseudocode:

Code: Select all

function add(p1,p2)
local x,y

function=p1+p2
end function

'main code
a=2:b=3
result=add(a,b)
The first thing most Windows compliers do on calling a new function is to set up the stack and load the parameters onto it.
In this example, there are 2 parameters (we'll assume all values are 32-bit to keep it easy) which need to be pushed first:

Code: Select all

!push a		'note, for simplicity we're pushing the value here so we're pasing the parameter By VALUE.
!push b		'usually we'd push the address of the parameter in order to pass it By REFERENCE
!call add()
..we're now in the ADD() procedure, the calling code has done all it has to, the rest is upto the called code:

Code: Select all

!push edi	'save the 3 non-scratch registers, as required. 
!push esi	'..This is not strictly needed if you don't use these 
            'registers in the procedure but most procedures..
!push ebx	'..use the same general format so they tend to always save the registers

!push ebp	'save the old value..
!mov ebp,esp	'copy the stack pointer to ebp 

!push 0		'make room for and initialise local variable x
!push 0		'make room for and initialise local variable y
At this point ebp marks a watershed between the parameters pushed onto the stack before the call and the data that we might push onto the stack during the procedure.

At address [ebp] we'll find the old value of ebp which allows us to look at the calling functions data if we need to.
At address [ebp+4] and [ebp+8] we'll find the 2 parameters that we passed
At address [ebp-4] and [ebp-8] we'll find the local variables defined for this procedure

Since we reference all local variables and parameters via ebp then, provided we preserve the value in ebp, we don't need to keep track of where on the stack the parameters are after we've pushed and popped our own data.

Code: Select all

!mov eax,[ebp+4]    'get first value
!add eax,[ebp+8]    'add in second value
result is now in eax where you want it. eax can be used to return the result to the calling code.

After the procedure is finished it needs to clean up.

Code: Select all

!mov esp,ebp	'immediately clears all local data, effectively popping it all off the stack
		'.. and discarding it in a single go.
		'We don't need to keep it as it was local to the procedure which has just ended.
!pop ebp	'restore ebp
!pop ebx	'restore the 3 non-scratch registers
!pop esi
!pop edi
!ret 8		'return from the procedure and pop off 8 Bytes, that's the 2 parameters we passed.
		'the value after RET is the number of bytes that all parameters took on the stack when the procedure was called.
		'Again, this discards all of the now redundant data is one go, very quickly.

The above gives a rough idea of how it's supposed to work. Different compilers may do it differently but the principle will be the same.

I hope it helps,

Paul,
not a guru, just a hobbyist!

Posted: Tue May 23, 2006 12:36 am
by Dare2
Excellent, thank you!

So, to be sure I have this:

EBP is a significant player within the stack. All below EBP (negative offsets, we go deeper) are our locals, all above (positives, we go higher towards the real start of the stack) are the incoming parameters. At ebp exactly is the address in the stack of an earlier EBP, which is the parent procedure's 'watershed'. (General purpose question mark here ? ) :)

(Aside: To overload operators then, we could push the numer of extra operators onto the stack and [EBP + 4] would be the count of extras?)


In my code above (fillWith, the second bit) I am heading for a stack overflow problem, it seems? I ret when I should ret 16 (to clear the 4 parameters passed in)?

Also i could have used [ebp + 4] (8,12,16) instead of [esp + 16] (20,24,28 )?

Finally, regarding the setting up of stacks for procedures. If a procedure calls a procedure, is it the responsibility of the caller to set up EBP and restore after? Should we know about stack frames and use things like ENTER and LEAVE?


Thanks again for the tute! :)

Edit: http://www.microsoft.com/msj/0298/hood0298.aspx : This guy also wrote some excellent stuff on PE / coff format which I found very useful a while back.

Posted: Tue May 23, 2006 6:14 pm
by Pupil
@Dare2
As far as i know PB(haven't installed PB 4 yet) doesn't use 'ebp' that way inside the procedure code that it creates, however that doesn't stop you from using this technique(it quite common to do so, i'm pretty certain C does this).

If you're using std call convention your code as posted above will produce a stack overflow after some call to the code, so you need to use 'ret 16'. However if you call this function C call style the calling code is supposed to correct the stack pointer so in that case you'll only need 'ret'.

It's you resonsibility to set up the ebp if you want to use this system, you can't assume that the calling code does this (unless you wrote it yourself).

Posted: Wed May 24, 2006 12:10 am
by dioxin
Dare2,
I was trying to give an idea of how the stack is usually used, if Pupil says that's not how PureBASIC does it than I'll take his word for it as I don't have a copy to check.
I'd be surprised if it didn't do something similar, referencing local variables directly with the stack pointer during user ASM code is not practical.

To try and answer your questions:
All below EBP .. are our locals, all above .. are the incoming parameters.
That's roughly how it should be but each compiler may push registers in a slightly different order , e.g. it may save esi,edi,ebx first and then ebp or it may save ebp first, but the principle is the same, parameters are above and locals are below.
To overload operators then..
I assume you mean you want to send an unknown number of parameters to the procedure? If so then the last item pushed (which will be at [ebp+4] as you said) should be the number of parameters pushed so the procedure knows how far up the stack it needs to go to fetch all of the passed parameters.

In my code above..
Assuming PB does it the way I'm suggesting (which is the way most of the software I've come accross does it) then it's the called procedure which needs to clean up the parameters from the stack so you will need a RET 16 to do this.
Also i could have used [ebp + 4] (8,12,16) instead of [esp + 16] (20,24,28 )?
This is the reason ebp is used the way I suggested.
If you write your own ASM and you push a value onto the stack for any reason then you immediatly lose track of the parameters. Any reference to a variable at, e.g. [esp-12] will be hard coded by the compiler. When you push a value then the reference to [esp-12] will no longer be correct and it'll point to some other data as esp changes when you push something onto the stack.
There is no way for the compiler to analyse you hand written ASM to correct for this.

Finally, regarding the setting up of stacks for procedures.

Caller sets up parameters only.
Called creates it's own stack frame using ebp.
Procedure code is run here..
Called clears up it's own stack frame
Called RETURNs and optionally clear parameters off the stack
Caller continues

I've rarely seen ENTER/LEAVE used but if you use them then it's the same rules:

Caller sets up parameters only.
Called creates it's own stack frame using ENTER which incidentally sets up ebp.
Procedure code is run here..
Called clears up it's own stack frame using LEAVE
Called RETURNs and optionally clear parameters off the stack
Caller continues


I think your next step has to be to find out how PureBASIC calls it's procedures.
I'm telling you how I've seen it done in the past and it is a good method but if PureBASIC does something significantly different then you need to know what it's up to or you'll just get confused between the 2 methods.

It shouldn't be too difficult to check. Just create a very simple procedure with 2 parameters and 2 local variables and look at the ASM produced by the compiler when the procedure is called.
You'll learn more following that than I can teach you!


Paul.

Posted: Wed May 24, 2006 1:52 am
by Dare2
@ Pupil and dioxin,

Thanks for that, you have been really helpful and informative. :)


@ Michael Vogel

This thread was a very good idea. I hope you're getting value from it! :)


As an aside, my feeling is that if an entire procedure is going to be asm, it is probably better to make it as an obj, or incorporate it in a lib, and import it. It feels cleaner. Inline asm to be used only when it tweaks stuff, mingled with PureBasic code. Any thoughts on that?

Posted: Wed May 24, 2006 9:50 am
by Michael Vogel
Dare2 wrote: This thread was a very good idea. I hope you're getting value from it! :)
Yes, thanks to all :!: for the nice snippets, I'll try learning from it (but slowly, especially for the moment, I've a lot of other things to do...)

And (of course) it not only give answers for questions but also create new questions...

>> Is it true, that assembler optimizing is (nearly) useless when it is packed into a procedure (because of it's overhead)? This would mean, it would be wasteless to write functions for - lets say - MinL(), MaxL(), Sgn() etc., the better way would be if Fred would implement such a function :lol:

>> Is it true, that a floating point multiplication is faster than an integer division?

Background information: I started to to check my 3D program with the "Analyzer" from Remi Meier (great tool!) and found some interesting things slowing down the code...

Code: Select all

 Box(0,0,screenx,srceeny,backgroundcolor)
  changed now to Clearscreen(backgroundcolor)

  Flipscreeen(1) changed to Flipscreen(2), does help a little bit

  windc=StartDrawing(ScreenOutput()) still costs enormous time
  Circle(x,y,k,color) takes also quite a while
But the most interesting (for me) was, that integer divisions take that much time!

; So the following code (division by 5) takes longer than multiplying by 0.2! Is this really normal? I tried to have a look into the comment assembler code and find that two MOV statements can be removed on the DIV section (marked with *), but its still slower than FMUL?!

Any comments?

Code: Select all

DisableDebugger
a=-GetTickCount_()
For i=0 To 9999999

	x=9999
	CompilerIf 0
		x/5
		x/5
		x/5
		x/5
		x/5
	CompilerElse
		x*0.2
		x*0.2
		x*0.2
		x*0.2
		x*0.2
	CompilerEndIf

Next i
a+GetTickCount_()
EnableDebugger
Debug a
End

Code: Select all

x/5:
MOV    eax,dword [v_x]			;ebx
MOV    eax,ebx						;*
MOV    ebx,3
CDQ
IDIV   ebx
MOV    ebx, eax						;*
MOV    dword [v_x],eax			;ebx

Code: Select all

x*0.2:
MOV    ebx,dword [v_x]
MOV    [esp-4],ebx
!FILD   dword [esp-4]
FMUL   qword [D1]
!FISTP  dword [v_x]

Posted: Wed May 24, 2006 12:35 pm
by dioxin
Dare2,
it is probably better to make it as an obj, or incorporate it in a lib
It's up to you how you find it best. Personally, I'd leave it inline in the main code or, if it's a full procedure, I'd use the compiler to handle the call/return and parameters and the code in the procedure would consist only of the ASM.
That way the compiler will 'know' how to call your code and any changes to the calling convention used in future compiler updates shouldn't matter.


Michael,
Is it true, that assembler optimizing is (nearly) useless when it is packed into a procedure (because of it's overhead)?
Not useless but for the tiny code snippets you were after the optimised verion's gain will be swamped by the procedure's overhead so it would be better if they were included in the compiler.

Is it true, that a floating point multiplication is faster than an integer division?
It's common practice to use a MUL instead of a DIV when you can get away with it because it's much quicker.
As an example, for the Athlon DIV can take around 40clk cycles to calculate. FMUL takes 4.
There are overheads involved with using the FPU this way but it'll still be faster than DIV.

But there are usually better ways if you are dividing by a constant value, if you want to divide a number by 5 then you could do this:

Code: Select all

MagicNumber=2^32/5     'this will be a precalculated constant in your code

!mov eax,TheNumber
!mul MagicNumber
!mov TheAnswer,edx
You now do the "div" using an integer multiply by reciprocal and it doesn't need the FPU and takes only 6clks on an Athlon (longer on a Pentium but you should still save time).

Paul.

Posted: Wed May 24, 2006 12:42 pm
by Psychophanta
A question to all of you who knows about this matter.
I don't know if it is a bug... but why can not be done this?:

Code: Select all

!CALL $00403A65
I get a ASM error here.

Posted: Wed May 24, 2006 2:26 pm
by Michael Vogel
dioxin wrote: [...] You could do this...

Code: Select all

MagicNumber=2^32/5     'this will be a precalculated constant in your code

!mov eax,TheNumber
!mul MagicNumber
!mov TheAnswer,edx
So I tried...

Code: Select all

MagicNumber=1<<32/5
TheNumber.l=125
TheAnswer.l

!mov eax,[v_TheNumber]
!mul [v_MagicNumber] 
!mov [v_TheAnswer],edx
Debug theanswer
And I get 24!

Posted: Wed May 24, 2006 2:34 pm
by dioxin
Michael,
that'll be a rounding error. Integer stuff truncates so the answer was probably 24.9999999 and it truncated to give 24 instead of rounsding to give 25
Try adding 1 to the MagicNumber.

Paul.