Optimization: Difference between revisions
(Created page with "As seen on the 68k instructions timings. =VRAM access= Since <pre>move.w *,xxx.L</pre> Is always slower than <pre>move.w *,d(An)</pre> Try reserving an address register...") |
|||
(5 intermediate revisions by 3 users not shown) | |||
Line 8: | Line 8: | ||
<syntaxhighlight> | <syntaxhighlight> | ||
lea | lea VRAM_RW,a5 | ||
move.w #$0001, | move.w #$0001,2(a5) ; VRAM_MOD | ||
... | ... | ||
move.w #$1234,(a5) | move.w #$1234,-2(a5) ; VRAM_ADDR | ||
... | ... | ||
move.w #$5678, | move.w #$5678,(a5) ; VRAM_RW | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 22: | Line 22: | ||
==Adressing== | ==Adressing== | ||
* (An)+ is faster than -(An), except for MOVEs (same). | |||
* Because (An) is faster without pre-dec/post-inc, access to the first element of a data structure is faster than to the others. | |||
* Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value. | |||
==CALL/RTS with JMP== | |||
A0 needs to be preserved but saves 8 cycles compared to call/rts. | |||
<pre> | <pre> | ||
lea return, | lea return, A0 | ||
jmp routine | jmp routine | ||
return: | return: | ||
</pre> | </pre> | ||
Then to return, just jmp (A0). | |||
==Replace JSR+RTS== | |||
<pre> | <pre> | ||
jsr subroutine -> jmp subroutine ; Saves 24 cycles | jsr subroutine -> jmp subroutine ; Saves 24 cycles | ||
Line 42: | Line 44: | ||
</pre> | </pre> | ||
==Replace JSR+JMP== | |||
Instead of: | |||
<pre> | <pre> | ||
jsr sub ; 18/20 | jsr sub ; 18/20 | ||
jmp next ; 10/12 | jmp next ; 10/12 | ||
</pre> | </pre> | ||
Do: | |||
<pre> | <pre> | ||
pea next ; 16/20 | pea next ; 16/20 | ||
Line 100: | Line 105: | ||
==Set large constants== | ==Set large constants== | ||
To move | To move $00010000 ... $007F0000 values to a data register: | ||
<pre> | |||
moveq.l #X,Dn ; X = $01 ... $7F | |||
swap Dn | |||
</pre> | |||
To move $FF80FFFF ... $FFFEFFFF values to a data register: | |||
<pre> | <pre> | ||
moveq.l # | moveq.l #X,Dn ; X = $FFFFFF80 ... $FFFFFFFE | ||
swap Dn | swap Dn | ||
</pre> | </pre> | ||
Line 112: | Line 123: | ||
<pre> | <pre> | ||
lsl.l #1,d0 -> add.l d0,d0 ; Saves | lsl.l #1,d0 -> add.l d0,d0 ; Saves 4 cycles | ||
</pre> | </pre> | ||
Latest revision as of 06:20, 3 April 2018
As seen on the 68k instructions timings.
VRAM access
Since
move.w *,xxx.L
Is always slower than
move.w *,d(An)
Try reserving an address register to hold VRAM_ADDR and add offsets to access VRAM_RW and VRAM_MOD. But be careful about VRAM timings !
lea VRAM_RW,a5
move.w #$0001,2(a5) ; VRAM_MOD
...
move.w #$1234,-2(a5) ; VRAM_ADDR
...
move.w #$5678,(a5) ; VRAM_RW
General 68k tricks
Many tricks are from [Easy68k].
Adressing
- (An)+ is faster than -(An), except for MOVEs (same).
- Because (An) is faster without pre-dec/post-inc, access to the first element of a data structure is faster than to the others.
- Don't assume that long operations are always slower than word-size ones. For instance, word address operations can be slower than long ones because of the time to sign-extend a word value.
CALL/RTS with JMP
A0 needs to be preserved but saves 8 cycles compared to call/rts.
lea return, A0 jmp routine return:
Then to return, just jmp (A0).
Replace JSR+RTS
jsr subroutine -> jmp subroutine ; Saves 24 cycles rts
Replace JSR+JMP
Instead of:
jsr sub ; 18/20 jmp next ; 10/12
Do:
pea next ; 16/20 jmp sub ; 10/12
Comparisons
cmp.l #xxx,Dn takes 14 cycles. If the value being tested for is small enough to fit in a moveq (-128 to +127), it's shorter and faster to put the value in a temporary register:
moveq.l #xxx,d0 cmp.l d0,d1
If the value xxx is between -8 and 8, and you don't mind altering the data register, you can just use subq #xxx,Dn (or addq) instead of cmp. Then you can use a conditional branch just as you would after a cmp. This works for word or longword comparisons.
Loops/searches
Since a taken short branch is slower than an untaken one, try to avoid taking most branches. For instance, if you have a loop searching for a null, the simple way to search is:
-: tst.b (a0)+ bne.s -
It takes only a bit more space to unroll one or more iterations of the loop:
-: tst.b (a0)+ beq.s found tst.b (a0)+ bne.s - found:
Clear data register
clr.l Dn -> moveq.l #0,Dn ; Saves 2 cycles
Clear address register
There are no CLR or MOVEQ for address registers.
move.l #0,An -> sub.l An,An ; Saves 4 cycles
Clear upper half of data register
andi.l #$0000FFFF,Dn -> swap Dn ; Saves 4 cycles clr.w Dn swap Dn
Set large constants
To move $00010000 ... $007F0000 values to a data register:
moveq.l #X,Dn ; X = $01 ... $7F swap Dn
To move $FF80FFFF ... $FFFEFFFF values to a data register:
moveq.l #X,Dn ; X = $FFFFFF80 ... $FFFFFFFE swap Dn
Shift/multiply data register
lsl.w #1,d0 -> add.w d0,d0 ; Saves 4 cycles
lsl.l #1,d0 -> add.l d0,d0 ; Saves 4 cycles
lsl.w #2,d0 -> add.w d0,d0 ; Saves 2 cycles add.w d0,d0
Add to address register
Useful when xxx is between -32768 and 32767.
adda.w #xxx,a0 -> lea 10(a0),a0
Rotates
moveq.l #16,d0 -> swap d1 ror.l d0,d1
moveq.l #15,d0 -> swap d1 ror.l d0,d1 rol.l #1,d1