As seen on the 68k instructions timings.
- 1 VRAM access
- 2 General 68k tricks
- 2.1 Adressing
- 2.2 CALL/RTS with JMP
- 2.3 Replace JSR+RTS
- 2.4 Replace JSR+JMP
- 2.5 Comparisons
- 2.6 Loops/searches
- 2.7 Clear data register
- 2.8 Clear address register
- 2.9 Clear upper half of data register
- 2.10 Set large constants
- 2.11 Shift/multiply data register
- 2.12 Add to address register
- 2.13 Rotates
Is always slower than
lea VRAM_RW,a5 move.w #$0001,2(a5) ; VRAM_MOD ... move.w #$1234,-2(a5) ; VRAM_ADDR ... move.w #$5678,(a5) ; VRAM_RW
General 68k tricks
Many tricks are from [Easy68k].
- (An)+ is faster than -(An), except for MOVEs (same).
- Because (An) is faster without pre-dec/post-inc, access to the first element of a data structure is faster than to the others.
- Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.
CALL/RTS with JMP
lea return, A0 jmp routine return:
Then to return, just jmp (A0). A0 Needs to be preserved but saves 8 cycles compared to call/rts.
jsr subroutine -> jmp subroutine ; Saves 24 cycles rts
jsr sub ; 18/20 jmp next ; 10/12
pea next ; 16/20 jmp sub ; 10/12
cmp.l #xxx,Dn takes 14 cycles. If the value being tested for is small enough to fit in a moveq (-128 to +127), it's shorter and faster to put the value in a temporary register:
moveq.l #xxx,d0 cmp.l d0,d1
If the value xxx is between -8 and 8, and you don't mind altering the data register, you can just use subq #xxx,Dn (or addq) instead of cmp. Then you can use a conditional branch just as you would after a cmp. This works for word or longword comparisons.
Since a taken short branch is slower than an untaken one, try to avoid taking most branches. For instance, if you have a loop searching for a null, the simple way to search is:
-: tst.b (a0)+ bne.s -
It takes only a bit more space to unroll one or more iterations of the loop:
-: tst.b (a0)+ beq.s found tst.b (a0)+ bne.s - found:
Clear data register
clr.l Dn -> moveq.l #0,Dn ; Saves 2 cycles
Clear address register
There are no CLR or MOVEQ for address registers.
move.l #0,An -> sub.l An,An ; Saves 4 cycles
Clear upper half of data register
andi.l #$0000FFFF,Dn -> swap Dn ; Saves 4 cycles clr.w Dn swap Dn
Set large constants
To move $00010000 ... $007F0000 values to a data register:
moveq.l #X,Dn ; X = $01 ... $7F swap Dn
To move $FF80FFFF ... $FFFEFFFF values to a data register:
moveq.l #X,Dn ; X = $FFFFFF80 ... $FFFFFFFE swap Dn
Shift/multiply data register
lsl.w #1,d0 -> add.w d0,d0 ; Saves 4 cycles
lsl.l #1,d0 -> add.l d0,d0 ; Saves 4 cycles
lsl.w #2,d0 -> add.w d0,d0 ; Saves 2 cycles add.w d0,d0
Add to address register
Useful when xxx is between -32768 and 32767.
adda.w #xxx,a0 -> lea 10(a0),a0
moveq.l #16,d0 -> swap d1 ror.l d0,d1
moveq.l #15,d0 -> swap d1 ror.l d0,d1 rol.l #1,d1