Simple Question on Inline ASM

Thorium · Post by **Thorium** » Fri Dec 04, 2009 5:41 pm

freak wrote:
Fred wrote: This is not necessarily true. By blowing up the code size with macros you can easily make the code slower due to caching effects. Calls have become really cheap in today's CPUs (because of branch prediction with call-return stacks etc). So worrying about this kind of thing is a waste of time and even dangerous. Write your code, benchmark it and then you can start thinking about these kinds of optimizations if you really need them (chances are you won't).

Just tested it and the macro was slightly faster than the procedure call without parameters. But just a few cycles, nothing to worry about.
btw. optimization and caching effects: Does the PB compiler unrole loops? Thats something you can blow your code and get a big performance plus.

newtheogott · Post by **newtheogott** » Fri Dec 04, 2009 6:05 pm

Does the PB compiler unrole loops? Thats something you can blow your code and get a big performance plus.

I'd be careful with such optimisation tips on CPU-Level. And i would not really recommend Fred to use much time on this topic.
Because these sort of Optimization are heavily CPU dependen.
Unrolling Loops is just an example. If you unrol a loop, the whole loop may get out of the L1 Cache etc.

This sort of optimisations can also get obsolete or even worse contraproductive with each new CPU Generation.
The new CPU's have an advanced Loop-Buffer.

This makes the pipeline shorter. But Nehalem goes a step farther. Nehalem's loop stream detector is at the end of the pipeline. When it sees a loop, the microprocessor can shut down everything except the loop stream detector, which sends out the appropriate instructions to a buffer

Small loops do not get decoded mutiple times interanlly but stay inside the L1 cache.

TOM's Hardware

With the Nehalem architecture, Intel has improved the functionality of the Loop Stream Detector. First of all the buffer is larger—it can now store 28 instructions. But what’s more, its position in the pipeline has changed. In Conroe, it was located just after the instruction fetch phase. It’s now located after the decoders; this new position allows a larger part of the pipeline to be disabled. The Nehalem’s Loop Stream Detector no longer stores x86 instructions, but rather µops. In this sense, i

As you can read, small loop with 28 (around) Instructions run in "Turbo Mode", therefore by unroling over this amount you possibly would loose speed.
I think its on long term much better to take time and choose the best algo, then to try to squeeze the final cycle out of the CPU.

About the GOSUB. This is just a difference in Compiler-Design from Powerbasic to Purebasic. In Powerbasic I can use CALL (Label) inside a Procedure. In the same way I could write GOSUB and looking at the generated ASM Output, the Compiler would just make in both cases a simple CALL instruction. This is possible as the whole design of the generated Code seems to be different then the Purebasic code. Maybe because the Purebasic Compiler is based on a Multi-Platform Intermediate-Design.
Therefore in Purebasic other rules apply and there will be other Commands to use for such situations.
i wondered that there is a GOSUB in the Help-File and even a "FakeReturn" but it was said "only in the Main Procedure. Now knowing that this is by Design, the discussion is worthless as said.

Thorium · Post by **Thorium** » Sat Dec 05, 2009 2:29 am

I have a Nehalem (Core i7) and i got 15% speed up on unroling. Something that totaly not work anymore on Nehalem is prefetching. The automatic prefetcher is extremly good. I never get bedder performance if i prefetch data.

newtheogott · Post by **newtheogott** » Fri Feb 19, 2016 1:35 pm

I have just seen, that GOSUB is now implemented.
Maybe then old threads like this one can be deleted?

GOSUB in PureBasic

charvista · Post by **charvista** » Sun Apr 24, 2016 11:02 am

newtheogott wrote:I have just seen, that GOSUB is now implemented.

The Gosub was already implemented a long time ago.
Only in the procedures it is (still) not allowed....

Gosub may only be used within the main body of the source code, and may not be used within procedures.

PeterJ · Post by **PeterJ** » Tue Apr 26, 2016 1:55 pm

I personally think a "gosub"-like function within procedures has some benefits:
- it allows to structure complex coding, by doing perform step1, perform step2, etc. and keep the called sub procedures separate
- it allows to re-use coding where just some "input"-variables change, without duplicating the code
Certainly all this can be achieved by using "real" procedures, the downside is the scope of the variables in the "main"-procedure is not available in the sub-procedure and must be passed to it.
I definetly agree, that performance is not the deal, for both solutions.

To knock up a simple GoSub proposal I use 3 Macros:

Code: Select all

; ----------------------------------------------------------------------------------------------
; GOSUB inside procedures  
; ----------------------------------------------------------------------------------------------
; Call internal Procedure 
Macro Fcall(proc)
CompilerIf Defined(*bback,#PB_Variable)=0
   Define *bback         ; Define return address pointer, just once
CompilerEndIf  
*bback=?Return_#Proc#MacroExpandedCount  ; save return address
Goto PROC_#proc                          ; Goto sub prpcedure  
Return_#proc#MacroExpandedCount#:        ; Define Return Label 
EndMacro
; ----------------------------------------------------------------------------------------------
; Define internal Procedure 
; ----------------------------------------------------------------------------------------------
Macro Fproc(proc)
Define *r#proc           ; Define pointer to save return address   
; ----- handle return process in sub procedure  ------------ 
RET_#proc#:              ; Return will processed here
*bback=*r#proc           ; restore return address
*r#proc=0                ; reset return address 
 !jmp dword [p.p_bback]  ; return to caller 
; ----- Entry for sub procedure  --------------------------- 
Proc_#proc#:             ; Entry of sub procedure call 
*r#proc=*bback           ; save return address 
EndMacro                 ; normal coding starts ... 
; ----------------------------------------------------------------------------------------------
; Return from internal Procedure 
; ----------------------------------------------------------------------------------------------
Macro FReturn(proc)
 Goto RET_#proc          ; Goto return handling in FPROC 
EndMacro

to test it you can use the following program

Code: Select all

Procedure Test()
  a=1           ; set variable a
  Fcall(sub1)   ; call sub SUB1 
  a=2           ; set variable a again  
  Fcall(sub1)   ; call SUB1 again
  ProcedureReturn   
; Sub Procedure    
  FProc(sub1)   ; Define SUB1
  Debug "called with a="+Str(a)
  FReturn(sub1) ; Return from SUB1 
EndProcedure

by using the JMP statement, the FCALL can be used several times and will return to the appropriate statement after the FCALL.

I agree, it is pretty much dependant on your programming style, you may like or dislike it, but that's the way I prefer it.

Danilo · Post by **Danilo** » Tue Apr 26, 2016 3:57 pm

See also: [Windows] Sub / EndSub / Call Macros

PeterJ · Post by **PeterJ** » Wed Apr 27, 2016 5:47 am

Congrats Danilo, this is really nice coding!

PureBasic Forums - English

Simple Question on Inline ASM

Re: Simple Question on Inline ASM

Re: Simple Question on Inline ASM

Re: Simple Question on Inline ASM

Re: Simple Question on Inline ASM

Re: Simple Question on Inline ASM

Re: Simple Question on Inline ASM

Re: Simple Question on Inline ASM

Re: Simple Question on Inline ASM