Optimization: Difference between revisions

From NeoGeo Development Wiki
Jump to navigation Jump to search
m ("moveq.l" always sign extent the byte value to a long.)
 
Line 27: Line 27:


==CALL/RTS with JMP==
==CALL/RTS with JMP==
A0 needs to be preserved but saves 8 cycles compared to call/rts.


<pre>
<pre>
Line 33: Line 35:
return:
return:
</pre>
</pre>
Then to return, just jmp (A0). A0 Needs to be preserved but saves 8 cycles compared to call/rts.
 
Then to return, just jmp (A0).


==Replace JSR+RTS==
==Replace JSR+RTS==

Latest revision as of 06:20, 3 April 2018

As seen on the 68k instructions timings.

VRAM access

Since

move.w *,xxx.L

Is always slower than

move.w *,d(An)

Try reserving an address register to hold VRAM_ADDR and add offsets to access VRAM_RW and VRAM_MOD. But be careful about VRAM timings !

    lea      VRAM_RW,a5
    move.w   #$0001,2(a5)    ; VRAM_MOD
    ...
    move.w   #$1234,-2(a5)   ; VRAM_ADDR
    ...
    move.w   #$5678,(a5)     ; VRAM_RW

General 68k tricks

Many tricks are from [Easy68k].

Adressing

  • (An)+ is faster than -(An), except for MOVEs (same).
  • Because (An) is faster without pre-dec/post-inc, access to the first element of a data structure is faster than to the others.
  • Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.

CALL/RTS with JMP

A0 needs to be preserved but saves 8 cycles compared to call/rts.

   lea   return, A0
   jmp   routine
return:

Then to return, just jmp (A0).

Replace JSR+RTS

   jsr subroutine  ->  jmp subroutine   ; Saves 24 cycles
   rts

Replace JSR+JMP

Instead of:

   jsr sub   ; 18/20
   jmp next  ; 10/12

Do:

   pea next  ; 16/20
   jmp sub   ; 10/12

Comparisons

cmp.l #xxx,Dn takes 14 cycles. If the value being tested for is small enough to fit in a moveq (-128 to +127), it's shorter and faster to put the value in a temporary register:

moveq.l #xxx,d0
cmp.l   d0,d1

If the value xxx is between -8 and 8, and you don't mind altering the data register, you can just use subq #xxx,Dn (or addq) instead of cmp. Then you can use a conditional branch just as you would after a cmp. This works for word or longword comparisons.

Loops/searches

Since a taken short branch is slower than an untaken one, try to avoid taking most branches. For instance, if you have a loop searching for a null, the simple way to search is:

-:
   tst.b (a0)+
   bne.s -

It takes only a bit more space to unroll one or more iterations of the loop:

-:
   tst.b (a0)+
   beq.s found
   tst.b (a0)+
   bne.s -
found:

Clear data register

clr.l   Dn      ->   moveq.l  #0,Dn     ; Saves 2 cycles

Clear address register

There are no CLR or MOVEQ for address registers.

move.l  #0,An   ->   sub.l    An,An     ; Saves 4 cycles

Clear upper half of data register

andi.l  #$0000FFFF,Dn  ->  swap   Dn    ; Saves 4 cycles
                           clr.w  Dn
                           swap   Dn

Set large constants

To move $00010000 ... $007F0000 values to a data register:

   moveq.l #X,Dn                       ; X = $01 ... $7F
   swap    Dn

To move $FF80FFFF ... $FFFEFFFF values to a data register:

   moveq.l #X,Dn                       ; X = $FFFFFF80 ... $FFFFFFFE
   swap    Dn

Shift/multiply data register

lsl.w   #1,d0   ->   add.w    d0,d0     ; Saves 4 cycles
lsl.l   #1,d0   ->   add.l    d0,d0     ; Saves 4 cycles
lsl.w   #2,d0   ->   add.w    d0,d0     ; Saves 2 cycles
                     add.w    d0,d0

Add to address register

Useful when xxx is between -32768 and 32767.

adda.w  #xxx,a0  ->   lea      10(a0),a0

Rotates

moveq.l #16,d0  ->   swap     d1
ror.l   d0,d1
moveq.l #15,d0  ->   swap     d1
ror.l   d0,d1        rol.l    #1,d1