Optimization: Difference between revisions

From NeoGeo Development Wiki
Jump to navigation Jump to search
(Created page with "As seen on the 68k instructions timings. =VRAM access= Since <pre>move.w *,xxx.L</pre> Is always slower than <pre>move.w *,d(An)</pre> Try reserving an address register...")
 
mNo edit summary
Line 8: Line 8:


<syntaxhighlight>
<syntaxhighlight>
     lea      VRAM_ADDR,a5
     lea      VRAM_RW,a5
     move.w  #$0001,4(a5)    ; VRAM_MOD
     move.w  #$0001,2(a5)    ; VRAM_MOD
     ...
     ...
     move.w  #$1234,(a5)     ; VRAM_ADDR
     move.w  #$1234,-2(a5)   ; VRAM_ADDR
     ...
     ...
     move.w  #$5678,2(a5)   ; VRAM_RW
     move.w  #$5678,(a5)     ; VRAM_RW
</syntaxhighlight>
</syntaxhighlight>


Line 100: Line 100:


==Set large constants==
==Set large constants==
To move xxxx0000 values to a data register:
To move 00xx0000 values to a data register:
<pre>
<pre>
   moveq.l #xxxx,Dn
   moveq.l #xx,Dn
   swap    Dn
   swap    Dn
</pre>
</pre>
Line 112: Line 112:


<pre>
<pre>
lsl.l  #1,d0  ->  add.l    d0,d0    ; Saves 2 cycles, not sure ?
lsl.l  #1,d0  ->  add.l    d0,d0    ; Saves 4 cycles
</pre>
</pre>
   
   

Revision as of 04:59, 23 September 2016

As seen on the 68k instructions timings.

VRAM access

Since

move.w *,xxx.L

Is always slower than

move.w *,d(An)

Try reserving an address register to hold VRAM_ADDR and add offsets to access VRAM_RW and VRAM_MOD. But be careful about VRAM timings !

    lea      VRAM_RW,a5
    move.w   #$0001,2(a5)    ; VRAM_MOD
    ...
    move.w   #$1234,-2(a5)   ; VRAM_ADDR
    ...
    move.w   #$5678,(a5)     ; VRAM_RW

General 68k tricks

Many tricks are from [Easy68k].

Adressing

(An)+ is faster than -(An), except for MOVEs (same).

Because (An) is faster than x(An), access to the first element of a data structure is faster than to the others.

Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.

Jump/call/return

   lea   return,a0
   jmp   routine
return:

Then to return, just jmp (a0). Uses a0 but saves 8 cycles.

   jsr subroutine  ->  jmp subroutine   ; Saves 24 cycles
   rts
   jsr sub   ; 18/20
   jmp next  ; 10/12
   pea next  ; 16/20
   jmp sub   ; 10/12

Comparisons

cmp.l #xxx,Dn takes 14 cycles. If the value being tested for is small enough to fit in a moveq (-128 to +127), it's shorter and faster to put the value in a temporary register:

moveq.l #xxx,d0
cmp.l   d0,d1

If the value xxx is between -8 and 8, and you don't mind altering the data register, you can just use subq #xxx,Dn (or addq) instead of cmp. Then you can use a conditional branch just as you would after a cmp. This works for word or longword comparisons.

Loops/searches

Since a taken short branch is slower than an untaken one, try to avoid taking most branches. For instance, if you have a loop searching for a null, the simple way to search is:

-:
   tst.b (a0)+
   bne.s -

It takes only a bit more space to unroll one or more iterations of the loop:

-:
   tst.b (a0)+
   beq.s found
   tst.b (a0)+
   bne.s -
found:

Clear data register

clr.l   Dn      ->   moveq.l  #0,Dn     ; Saves 2 cycles

Clear address register

There are no CLR or MOVEQ for address registers.

move.l  #0,An   ->   sub.l    An,An     ; Saves 4 cycles

Clear upper half of data register

andi.l  #$0000FFFF,Dn  ->  swap   Dn    ; Saves 4 cycles
                           clr.w  Dn
                           swap   Dn

Set large constants

To move 00xx0000 values to a data register:

   moveq.l #xx,Dn
   swap    Dn

Shift/multiply data register

lsl.w   #1,d0   ->   add.w    d0,d0     ; Saves 4 cycles
lsl.l   #1,d0   ->   add.l    d0,d0     ; Saves 4 cycles
lsl.w   #2,d0   ->   add.w    d0,d0     ; Saves 2 cycles
                     add.w    d0,d0

Add to address register

Useful when xxx is between -32768 and 32767.

adda.w  #xxx,a0  ->   lea      10(a0),a0

Rotates

moveq.l #16,d0  ->   swap     d1
ror.l   d0,d1
moveq.l #15,d0  ->   swap     d1
ror.l   d0,d1        rol.l    #1,d1