Optimization: Difference between revisions
(One intermediate revision by one other user not shown) | |||
Line 27: | Line 27: | ||
==CALL/RTS with JMP== | ==CALL/RTS with JMP== | ||
A0 needs to be preserved but saves 8 cycles compared to call/rts. | |||
<pre> | <pre> | ||
Line 33: | Line 35: | ||
return: | return: | ||
</pre> | </pre> | ||
Then to return, just jmp (A0) | |||
Then to return, just jmp (A0). | |||
==Replace JSR+RTS== | ==Replace JSR+RTS== | ||
Line 102: | Line 105: | ||
==Set large constants== | ==Set large constants== | ||
To move | To move $00010000 ... $007F0000 values to a data register: | ||
<pre> | |||
moveq.l #X,Dn ; X = $01 ... $7F | |||
swap Dn | |||
</pre> | |||
To move $FF80FFFF ... $FFFEFFFF values to a data register: | |||
<pre> | <pre> | ||
moveq.l # | moveq.l #X,Dn ; X = $FFFFFF80 ... $FFFFFFFE | ||
swap Dn | swap Dn | ||
</pre> | </pre> |
Latest revision as of 06:20, 3 April 2018
As seen on the 68k instructions timings.
VRAM access
Since
move.w *,xxx.L
Is always slower than
move.w *,d(An)
Try reserving an address register to hold VRAM_ADDR and add offsets to access VRAM_RW and VRAM_MOD. But be careful about VRAM timings !
lea VRAM_RW,a5
move.w #$0001,2(a5) ; VRAM_MOD
...
move.w #$1234,-2(a5) ; VRAM_ADDR
...
move.w #$5678,(a5) ; VRAM_RW
General 68k tricks
Many tricks are from [Easy68k].
Adressing
- (An)+ is faster than -(An), except for MOVEs (same).
- Because (An) is faster without pre-dec/post-inc, access to the first element of a data structure is faster than to the others.
- Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.
CALL/RTS with JMP
A0 needs to be preserved but saves 8 cycles compared to call/rts.
lea return, A0 jmp routine return:
Then to return, just jmp (A0).
Replace JSR+RTS
jsr subroutine -> jmp subroutine ; Saves 24 cycles rts
Replace JSR+JMP
Instead of:
jsr sub ; 18/20 jmp next ; 10/12
Do:
pea next ; 16/20 jmp sub ; 10/12
Comparisons
cmp.l #xxx,Dn takes 14 cycles. If the value being tested for is small enough to fit in a moveq (-128 to +127), it's shorter and faster to put the value in a temporary register:
moveq.l #xxx,d0 cmp.l d0,d1
If the value xxx is between -8 and 8, and you don't mind altering the data register, you can just use subq #xxx,Dn (or addq) instead of cmp. Then you can use a conditional branch just as you would after a cmp. This works for word or longword comparisons.
Loops/searches
Since a taken short branch is slower than an untaken one, try to avoid taking most branches. For instance, if you have a loop searching for a null, the simple way to search is:
-: tst.b (a0)+ bne.s -
It takes only a bit more space to unroll one or more iterations of the loop:
-: tst.b (a0)+ beq.s found tst.b (a0)+ bne.s - found:
Clear data register
clr.l Dn -> moveq.l #0,Dn ; Saves 2 cycles
Clear address register
There are no CLR or MOVEQ for address registers.
move.l #0,An -> sub.l An,An ; Saves 4 cycles
Clear upper half of data register
andi.l #$0000FFFF,Dn -> swap Dn ; Saves 4 cycles clr.w Dn swap Dn
Set large constants
To move $00010000 ... $007F0000 values to a data register:
moveq.l #X,Dn ; X = $01 ... $7F swap Dn
To move $FF80FFFF ... $FFFEFFFF values to a data register:
moveq.l #X,Dn ; X = $FFFFFF80 ... $FFFFFFFE swap Dn
Shift/multiply data register
lsl.w #1,d0 -> add.w d0,d0 ; Saves 4 cycles
lsl.l #1,d0 -> add.l d0,d0 ; Saves 4 cycles
lsl.w #2,d0 -> add.w d0,d0 ; Saves 2 cycles add.w d0,d0
Add to address register
Useful when xxx is between -32768 and 32767.
adda.w #xxx,a0 -> lea 10(a0),a0
Rotates
moveq.l #16,d0 -> swap d1 ror.l d0,d1
moveq.l #15,d0 -> swap d1 ror.l d0,d1 rol.l #1,d1