Optimization

As seen on the 68k instructions timings.

=VRAM access=

Since move.w *,xxx.L Is always slower than move.w *,d(An) Try reserving an address register to hold and add offsets to access  and. But be careful about VRAM timings !

=General 68k tricks=

Many tricks are from [Easy68k].

Adressing

 * (An)+ is faster than -(An), except for MOVEs (same).
 * Because (An) is faster without pre-dec/post-inc, access to the first element of a data structure is faster than to the others.
 * Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.

CALL/RTS with JMP
lea  return, A0   jmp   routine return: Then to return, just jmp (A0). A0 Needs to be preserved but saves 8 cycles compared to call/rts.

Replace JSR+RTS
jsr subroutine ->  jmp subroutine   ; Saves 24 cycles rts

Replace JSR+JMP
Instead of: jsr sub  ; 18/20 jmp next ; 10/12 Do: pea next ; 16/20 jmp sub  ; 10/12

Comparisons
cmp.l #xxx,Dn takes 14 cycles. If the value being tested for is small enough to fit in a moveq (-128 to +127), it's shorter and faster to put the value in a temporary register:

moveq.l #xxx,d0 cmp.l  d0,d1

If the value xxx is between -8 and 8, and you don't mind altering the data register, you can just use subq #xxx,Dn (or addq) instead of cmp. Then you can use a conditional branch just as you would after a cmp. This works for word or longword comparisons.

Loops/searches
Since a taken short branch is slower than an untaken one, try to avoid taking most branches. For instance, if you have a loop searching for a null, the simple way to search is:

-:  tst.b (a0)+ bne.s -

It takes only a bit more space to unroll one or more iterations of the loop:

-:  tst.b (a0)+ beq.s found tst.b (a0)+ bne.s - found:

Clear data register
clr.l  Dn      ->   moveq.l  #0,Dn     ; Saves 2 cycles

Clear address register
There are no CLR or MOVEQ for address registers. move.l #0,An   ->   sub.l    An,An     ; Saves 4 cycles

Clear upper half of data register
andi.l #$0000FFFF,Dn  ->  swap   Dn    ; Saves 4 cycles clr.w Dn                           swap   Dn

Set large constants
To move 00xx0000 values to a data register: moveq.l #xx,Dn swap   Dn

Shift/multiply data register
lsl.w  #1,d0   ->   add.w    d0,d0     ; Saves 4 cycles

lsl.l  #1,d0   ->   add.l    d0,d0     ; Saves 4 cycles lsl.w  #2,d0   ->   add.w    d0,d0     ; Saves 2 cycles add.w   d0,d0

Add to address register
Useful when xxx is between -32768 and 32767. adda.w #xxx,a0  ->   lea      10(a0),a0

Rotates
moveq.l #16,d0 ->   swap     d1 ror.l   d0,d1

moveq.l #15,d0 ->   swap     d1 ror.l   d0,d1        rol.l    #1,d1