Latest revision as of 06:20, 3 April 2018

As seen on the 68k instructions timings.

VRAM access

Since

move.w *,xxx.L

Is always slower than

move.w *,d(An)

Try reserving an address register to hold VRAM_ADDR and add offsets to access VRAM_RW and VRAM_MOD. But be careful about VRAM timings !

    lea      VRAM_RW,a5
    move.w   #$0001,2(a5)    ; VRAM_MOD
    ...
    move.w   #$1234,-2(a5)   ; VRAM_ADDR
    ...
    move.w   #$5678,(a5)     ; VRAM_RW

General 68k tricks

Many tricks are from [Easy68k].

Adressing

(An)+ is faster than -(An), except for MOVEs (same).
Because (An) is faster without pre-dec/post-inc, access to the first element of a data structure is faster than to the others.
Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.

CALL/RTS with JMP

A0 needs to be preserved but saves 8 cycles compared to call/rts.

   lea   return, A0
   jmp   routine
return:

Then to return, just jmp (A0).

Replace JSR+RTS

   jsr subroutine  ->  jmp subroutine   ; Saves 24 cycles
   rts

Replace JSR+JMP

Instead of:

   jsr sub   ; 18/20
   jmp next  ; 10/12

Do:

   pea next  ; 16/20
   jmp sub   ; 10/12

Comparisons

cmp.l #xxx,Dn takes 14 cycles. If the value being tested for is small enough to fit in a moveq (-128 to +127), it's shorter and faster to put the value in a temporary register:

moveq.l #xxx,d0
cmp.l   d0,d1

If the value xxx is between -8 and 8, and you don't mind altering the data register, you can just use subq #xxx,Dn (or addq) instead of cmp. Then you can use a conditional branch just as you would after a cmp. This works for word or longword comparisons.

Loops/searches

Since a taken short branch is slower than an untaken one, try to avoid taking most branches. For instance, if you have a loop searching for a null, the simple way to search is:

-:
   tst.b (a0)+
   bne.s -

It takes only a bit more space to unroll one or more iterations of the loop:

-:
   tst.b (a0)+
   beq.s found
   tst.b (a0)+
   bne.s -
found:

Clear data register

clr.l   Dn      ->   moveq.l  #0,Dn     ; Saves 2 cycles

Clear address register

There are no CLR or MOVEQ for address registers.

move.l  #0,An   ->   sub.l    An,An     ; Saves 4 cycles

Clear upper half of data register

andi.l  #$0000FFFF,Dn  ->  swap   Dn    ; Saves 4 cycles
                           clr.w  Dn
                           swap   Dn

Set large constants

To move $00010000 ... $007F0000 values to a data register:

   moveq.l #X,Dn                       ; X = $01 ... $7F
   swap    Dn

To move $FF80FFFF ... $FFFEFFFF values to a data register:

   moveq.l #X,Dn                       ; X = $FFFFFF80 ... $FFFFFFFE
   swap    Dn

Shift/multiply data register

lsl.w   #1,d0   ->   add.w    d0,d0     ; Saves 4 cycles

lsl.l   #1,d0   ->   add.l    d0,d0     ; Saves 4 cycles

lsl.w   #2,d0   ->   add.w    d0,d0     ; Saves 2 cycles
                     add.w    d0,d0

Add to address register

Useful when xxx is between -32768 and 32767.

adda.w  #xxx,a0  ->   lea      10(a0),a0

Rotates

moveq.l #16,d0  ->   swap     d1
ror.l   d0,d1

moveq.l #15,d0  ->   swap     d1
ror.l   d0,d1        rol.l    #1,d1

@@ Line 22: / Line 22: @@
 ==Adressing==
-<pre>(An)+ is faster than -(An), except for MOVEs (same).</pre>
+* (An)+ is faster than -(An), except for MOVEs (same).
+* Because (An) is faster without pre-dec/post-inc, access to the first element of a data structure is faster than to the others.
+* Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.
-Because (An) is faster than x(An), access to the first element of a data structure is faster than to the others.
+==CALL/RTS with JMP==
-Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.
+A0 needs to be preserved but saves 8 cycles compared to call/rts.
-==Jump/call/return==
 <pre>
-    lea   return,a0
+    lea   return, A0
     jmp   routine
 return:
 </pre>
-Then to return, just jmp (a0). Uses a0 but saves 8 cycles.
+Then to return, just jmp (A0).
+==Replace JSR+RTS==
 <pre>
     jsr subroutine  ->  jmp subroutine   ; Saves 24 cycles
@@ Line 42: / Line 44: @@
 </pre>
+==Replace JSR+JMP==
+Instead of:
 <pre>
     jsr sub   ; 18/20
     jmp next  ; 10/12
 </pre>
+Do:
 <pre>
     pea next  ; 16/20
@@ Line 100: / Line 105: @@
 ==Set large constants==
-To move 00xx0000 values to a data register:
+To move $00010000 ... $007F0000 values to a data register:
+<pre>
+   moveq.l #X,Dn                       ; X = $01 ... $7F
+   swap    Dn
+</pre>
+To move $FF80FFFF ... $FFFEFFFF values to a data register:
 <pre>
-    moveq.l #xx,Dn
+    moveq.l #X,Dn                       ; X = $FFFFFF80 ... $FFFFFFFE
     swap    Dn
 </pre>

Optimization: Difference between revisions