Optimization: Difference between revisions

From NeoGeo Development Wiki
Jump to navigation Jump to search
(Created page with "As seen on the 68k instructions timings. =VRAM access= Since <pre>move.w *,xxx.L</pre> Is always slower than <pre>move.w *,d(An)</pre> Try reserving an address register...")
 
 
(5 intermediate revisions by 3 users not shown)
Line 8: Line 8:


<syntaxhighlight>
<syntaxhighlight>
     lea      VRAM_ADDR,a5
     lea      VRAM_RW,a5
     move.w  #$0001,4(a5)    ; VRAM_MOD
     move.w  #$0001,2(a5)    ; VRAM_MOD
     ...
     ...
     move.w  #$1234,(a5)     ; VRAM_ADDR
     move.w  #$1234,-2(a5)   ; VRAM_ADDR
     ...
     ...
     move.w  #$5678,2(a5)   ; VRAM_RW
     move.w  #$5678,(a5)     ; VRAM_RW
</syntaxhighlight>
</syntaxhighlight>


Line 22: Line 22:
==Adressing==
==Adressing==


<pre>(An)+ is faster than -(An), except for MOVEs (same).</pre>
* (An)+ is faster than -(An), except for MOVEs (same).
* Because (An) is faster without pre-dec/post-inc, access to the first element of a data structure is faster than to the others.
* Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.


Because (An) is faster than x(An), access to the first element of a data structure is faster than to the others.
==CALL/RTS with JMP==


Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.
A0 needs to be preserved but saves 8 cycles compared to call/rts.
 
==Jump/call/return==


<pre>
<pre>
   lea  return,a0
   lea  return, A0
   jmp  routine
   jmp  routine
return:
return:
</pre>
</pre>
Then to return, just jmp (a0). Uses a0 but saves 8 cycles.


Then to return, just jmp (A0).
==Replace JSR+RTS==
<pre>
<pre>
   jsr subroutine  ->  jmp subroutine  ; Saves 24 cycles
   jsr subroutine  ->  jmp subroutine  ; Saves 24 cycles
Line 42: Line 44:
</pre>
</pre>


==Replace JSR+JMP==
Instead of:
<pre>
<pre>
   jsr sub  ; 18/20
   jsr sub  ; 18/20
   jmp next  ; 10/12
   jmp next  ; 10/12
</pre>
</pre>
Do:
<pre>
<pre>
   pea next  ; 16/20
   pea next  ; 16/20
Line 100: Line 105:


==Set large constants==
==Set large constants==
To move xxxx0000 values to a data register:
To move $00010000 ... $007F0000 values to a data register:
<pre>
  moveq.l #X,Dn                      ; X = $01 ... $7F
  swap    Dn
</pre>
 
To move $FF80FFFF ... $FFFEFFFF values to a data register:
<pre>
<pre>
   moveq.l #xxxx,Dn
   moveq.l #X,Dn                       ; X = $FFFFFF80 ... $FFFFFFFE
   swap    Dn
   swap    Dn
</pre>
</pre>
Line 112: Line 123:


<pre>
<pre>
lsl.l  #1,d0  ->  add.l    d0,d0    ; Saves 2 cycles, not sure ?
lsl.l  #1,d0  ->  add.l    d0,d0    ; Saves 4 cycles
</pre>
</pre>
   
   

Latest revision as of 06:20, 3 April 2018

As seen on the 68k instructions timings.

VRAM access

Since

move.w *,xxx.L

Is always slower than

move.w *,d(An)

Try reserving an address register to hold VRAM_ADDR and add offsets to access VRAM_RW and VRAM_MOD. But be careful about VRAM timings !

    lea      VRAM_RW,a5
    move.w   #$0001,2(a5)    ; VRAM_MOD
    ...
    move.w   #$1234,-2(a5)   ; VRAM_ADDR
    ...
    move.w   #$5678,(a5)     ; VRAM_RW

General 68k tricks

Many tricks are from [Easy68k].

Adressing

  • (An)+ is faster than -(An), except for MOVEs (same).
  • Because (An) is faster without pre-dec/post-inc, access to the first element of a data structure is faster than to the others.
  • Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.

CALL/RTS with JMP

A0 needs to be preserved but saves 8 cycles compared to call/rts.

   lea   return, A0
   jmp   routine
return:

Then to return, just jmp (A0).

Replace JSR+RTS

   jsr subroutine  ->  jmp subroutine   ; Saves 24 cycles
   rts

Replace JSR+JMP

Instead of:

   jsr sub   ; 18/20
   jmp next  ; 10/12

Do:

   pea next  ; 16/20
   jmp sub   ; 10/12

Comparisons

cmp.l #xxx,Dn takes 14 cycles. If the value being tested for is small enough to fit in a moveq (-128 to +127), it's shorter and faster to put the value in a temporary register:

moveq.l #xxx,d0
cmp.l   d0,d1

If the value xxx is between -8 and 8, and you don't mind altering the data register, you can just use subq #xxx,Dn (or addq) instead of cmp. Then you can use a conditional branch just as you would after a cmp. This works for word or longword comparisons.

Loops/searches

Since a taken short branch is slower than an untaken one, try to avoid taking most branches. For instance, if you have a loop searching for a null, the simple way to search is:

-:
   tst.b (a0)+
   bne.s -

It takes only a bit more space to unroll one or more iterations of the loop:

-:
   tst.b (a0)+
   beq.s found
   tst.b (a0)+
   bne.s -
found:

Clear data register

clr.l   Dn      ->   moveq.l  #0,Dn     ; Saves 2 cycles

Clear address register

There are no CLR or MOVEQ for address registers.

move.l  #0,An   ->   sub.l    An,An     ; Saves 4 cycles

Clear upper half of data register

andi.l  #$0000FFFF,Dn  ->  swap   Dn    ; Saves 4 cycles
                           clr.w  Dn
                           swap   Dn

Set large constants

To move $00010000 ... $007F0000 values to a data register:

   moveq.l #X,Dn                       ; X = $01 ... $7F
   swap    Dn

To move $FF80FFFF ... $FFFEFFFF values to a data register:

   moveq.l #X,Dn                       ; X = $FFFFFF80 ... $FFFFFFFE
   swap    Dn

Shift/multiply data register

lsl.w   #1,d0   ->   add.w    d0,d0     ; Saves 4 cycles
lsl.l   #1,d0   ->   add.l    d0,d0     ; Saves 4 cycles
lsl.w   #2,d0   ->   add.w    d0,d0     ; Saves 2 cycles
                     add.w    d0,d0

Add to address register

Useful when xxx is between -32768 and 32767.

adda.w  #xxx,a0  ->   lea      10(a0),a0

Rotates

moveq.l #16,d0  ->   swap     d1
ror.l   d0,d1
moveq.l #15,d0  ->   swap     d1
ror.l   d0,d1        rol.l    #1,d1