Optimization

From NeoGeo Development Wiki
Revision as of 09:26, 5 June 2017 by Anima (talk | contribs) ("moveq.l" always sign extent the byte value to a long.)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

As seen on the 68k instructions timings.

VRAM access

Since

move.w *,xxx.L

Is always slower than

move.w *,d(An)

Try reserving an address register to hold VRAM_ADDR and add offsets to access VRAM_RW and VRAM_MOD. But be careful about VRAM timings !

    lea      VRAM_RW,a5
    move.w   #$0001,2(a5)    ; VRAM_MOD
    ...
    move.w   #$1234,-2(a5)   ; VRAM_ADDR
    ...
    move.w   #$5678,(a5)     ; VRAM_RW

General 68k tricks

Many tricks are from [Easy68k].

Adressing

  • (An)+ is faster than -(An), except for MOVEs (same).
  • Because (An) is faster without pre-dec/post-inc, access to the first element of a data structure is faster than to the others.
  • Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.

CALL/RTS with JMP

   lea   return, A0
   jmp   routine
return:

Then to return, just jmp (A0). A0 Needs to be preserved but saves 8 cycles compared to call/rts.

Replace JSR+RTS

   jsr subroutine  ->  jmp subroutine   ; Saves 24 cycles
   rts

Replace JSR+JMP

Instead of:

   jsr sub   ; 18/20
   jmp next  ; 10/12

Do:

   pea next  ; 16/20
   jmp sub   ; 10/12

Comparisons

cmp.l #xxx,Dn takes 14 cycles. If the value being tested for is small enough to fit in a moveq (-128 to +127), it's shorter and faster to put the value in a temporary register:

moveq.l #xxx,d0
cmp.l   d0,d1

If the value xxx is between -8 and 8, and you don't mind altering the data register, you can just use subq #xxx,Dn (or addq) instead of cmp. Then you can use a conditional branch just as you would after a cmp. This works for word or longword comparisons.

Loops/searches

Since a taken short branch is slower than an untaken one, try to avoid taking most branches. For instance, if you have a loop searching for a null, the simple way to search is:

-:
   tst.b (a0)+
   bne.s -

It takes only a bit more space to unroll one or more iterations of the loop:

-:
   tst.b (a0)+
   beq.s found
   tst.b (a0)+
   bne.s -
found:

Clear data register

clr.l   Dn      ->   moveq.l  #0,Dn     ; Saves 2 cycles

Clear address register

There are no CLR or MOVEQ for address registers.

move.l  #0,An   ->   sub.l    An,An     ; Saves 4 cycles

Clear upper half of data register

andi.l  #$0000FFFF,Dn  ->  swap   Dn    ; Saves 4 cycles
                           clr.w  Dn
                           swap   Dn

Set large constants

To move $00010000 ... $007F0000 values to a data register:

   moveq.l #X,Dn                       ; X = $01 ... $7F
   swap    Dn

To move $FF80FFFF ... $FFFEFFFF values to a data register:

   moveq.l #X,Dn                       ; X = $FFFFFF80 ... $FFFFFFFE
   swap    Dn

Shift/multiply data register

lsl.w   #1,d0   ->   add.w    d0,d0     ; Saves 4 cycles
lsl.l   #1,d0   ->   add.l    d0,d0     ; Saves 4 cycles
lsl.w   #2,d0   ->   add.w    d0,d0     ; Saves 2 cycles
                     add.w    d0,d0

Add to address register

Useful when xxx is between -32768 and 32767.

adda.w  #xxx,a0  ->   lea      10(a0),a0

Rotates

moveq.l #16,d0  ->   swap     d1
ror.l   d0,d1
moveq.l #15,d0  ->   swap     d1
ror.l   d0,d1        rol.l    #1,d1