Table of Contents (click on a link to jump to that web page)
- Part 1 – Quick and Easy Changes to Speedup Your Code
- Part 2 – Speedup Storing Data – Unrolling Loops
- Part 3 – Stack Blasting and Self Modifying Code
- Part 4 – Odds and Sods – More Tricks
I’ve run out of notes that I had for speeding up 6809 assembly. I’ll update this page with anymore cool ideas that anyone shares with me or puts in the comments.
Dave Philipsen wrote with this tip:
I don’t know that it necessarily optimizes for speed but it saves space. When printing a string or any kind of calling a subroutine which requires a string, instead of pointing to the string and then calling the subroutine like this:
ldx #strptr * point to the string jsr prtstr * print the string lda #?? * continue with the rest of the program
You can do this:
jsr prtstr * call the print string subroutine which pulls * the address of the string from the fcs /text string/ * program counter which was just pushed to * the stack lda #?? * continue with the rest of the program
The prtstr routine can change the program counter as it is saved on the stack so that when the routine returns, it returns to the point just past the end of the string. This optimizes for size by eliminating the need to load the pointer each time you print. It also reduces complexity because you don’t need to assign a label to the string.
Another example of this might be calling a subroutine which positions the cursor on the screen.
ldd #$0101 * A=1 (x coord), B=1 (y coord) jsr curXY * position the cursor lda #$?? * continue with program
jsr curXY * position the cursor, X and Y are pointed to * by the program counter fdb #$0101 * A=1 (x coord), B=1 (y coord) lda #$?? * continue with the program
Art Flexser wrote with this tip use STA instead of CLR (if you can):
When addressing some CoCo hardware registers, STA is a cycle faster than CLR and has the same effect. Erik Gavriluk pointed out that using CLR does affect the Condition Codes and that should be taken into account. Also “STA sets flags, too. It’s weird, but CLR really does write AND read from the memory address in question. There are CoCo hardware registers where this can cause a problem.”
CLR $FFDE * Slow way STA $FFDE * Faster way
Maybe save a byte of space with this trick. I found out looking at some BASIC unravelled source code of a neat little trick to save a byte of code and some CPU cycles you can optimize this bit of code:
LA974 CLRA BRA Skip LA977 LDA #8 Skip STA ,-S ...
To this version:
LA974 CLRA FCB $8C LA976 LDA #8 STA ,-S ...
The use of the FCB $8C is actually a CMPX #xxxx instruction so the CPU thinks the LDA #8 is the address of the CMPX # instruction then the CPU carries on. This doesn’t make the program faster but it does save a byte of space (if you really need it).
Speeding up IRQ/FIRQs
I remembered one other thing about speeding up your IRQ/FIRQ jumps that you can do to speed them up:
Normally to jump to an IRQ or FIRQ you would store a $7E as the first byte of the interrupt jump pointer then you store the address in the next two bytes. If your interrupt is in the Direct Page memory you can jump to that address using a DP jump instruction and the pointer to your interrupt in the next byte.
Setup an FIRQ that is in DP space.
The DP is set to $3E and the FIRQ routine starts in RAM at address $3ECA
LDA #$0E * Write the direct page JMP instruction LDB #$CA * Will jump to address DP + $CA = $3ECA in our example STD $010F * CoCo 1 & 2 FIRQ Jump pointer
Simon Jonassen told me about trick that you can use with the jump location in a CoCo 1 & 2 FIRQ jump pointer is to actually just put your entire FIRQ routine starting at $010F. That way you save the few extra cycles that would normally be used with the usual JMP $xxxx instruction.
L. Curtis Boyle posted this tip in the comments section:
To save memory, when doing cmpx #0 or cmpy #0, use leax ,x or leay ,y (note: leau and leas do NOT set zero flag, so this trick can’t be used with those registers). This saves 1 byte for cmpx, and 2 bytes for cmpy.
In y’s case, it’s the same speed, too.
On that same note, since LEAX and LEAY do set the zero flag you could also do the test for zero right after the LEA instruction if it works in your program. For example:
LDX #$2000 XisNotZero: LEAX -32,X BNE XisNotZero
Another little trick to optimize a 7 bit left shift into bit 0. I learned this while disassembling Defender. A 7 bit left shift that’s only 4 CPU cycles. Keep in mind the Carry bit is cleared.
ASLA ADCA #$00
If your code uses a Load and ,U++ then use this instead:
PULU D * One Cycle faster then LDD ,U++ PULU X * One Cycle faster then LDX ,U++ PULU Y * Two Cycles faster then LDY ,U++ and one byte shorter
Gotta love the 6809 Stacks!