Zilog z80 to Motorola 6809 Transcode – Part 023 – Optimized sprite rendering, combining Compiled sprites with Stack Blasting

Talking with other CoCo users about optimizing the sprite rendering on the CoCo I’ve figured out the best way to render the sprites for Pac Man on the CoCo3.  This article will summarize what I’ve learned and then explain what I will use for Pac Man.

I got some great tips from  Richard Goedeken’s Game Engine for the CoCo 3 called “Dynosprite” it is available on github.  Going through his source code I found his code not only does straight LDA, LDB, LDD commands but also checks to see if the following instructions can be used with the A or B accumulators:


This saves another byte over a straight LD instruction, but the speed is the same.  Every byte counts!

For the examples below I’m using a Pac Man sprite facing the right with his mouth wide open.  The current palette uses 9 as yellow and 4 as black, two pixels per byte.

Previously the fastest method I thought for doing compiled sprites was the following:

[4+1]   5 LEAU 5,X
[3]     8 LDD #$4999
[3]    11 LDX #$9999
[4+1]  16 STA -3,U
[5+1]  22 STX -2,U
[5+1]  28 STD -4+128,U
[5+1]  34 STX -2+128,U
[4+4]  42 LEAU 256,U
[4]    46 LDY #$9994
[5+1]  52 STX -4,U
[6+1]  59 STY -2,U
[4+1]  64 STB -4+128,U
[5+1]  70 STX -3+128,U
[4+4]  78 LEAU 256,U
[5+1]  84 STD -5,U
[6+1]  91 STY -3,U
[4+1]  96 STA -5+128,U
[5+1] 102 STX -4+128,U
[4+4] 110 LEAU 256,U
[4+1] 115 STA -5,U
[6+1] 122 STY -4,U
[4+1] 127 STA -5+128,U
[5+1] 133 STX -4+128,U
[4+4] 141 LEAU 256,U
[5+1] 147 STD -5,U
[6+1] 154 STY -3,U
[4+1] 159 STB -4+128,U
[5+1] 165 STX -3+128,U
[4+4] 173 LEAU 256+128,U
[5+4] 182 STX -4-128,U
[6+4] 192 STY -2-128,U
[5+1] 198 STD -4,U
[5+1] 204 STX -2,U
[4+1] 209 STA -3+128,U
[5+1] 215 STX -2+128,U
[5]   220 RTS

This method is still faster then Stack Blasting and this method takes 220 CPU cycles to draw on screen and 106 bytes of RAM.  Which is pretty good since a full 16×16 sprite would take 128 bytes.  Another benefit of compiled sprites is you aren’t stuck to writing a certain block size to the screen.  This actual sprite is really only 10 pixels x 13 rows.  As bitmap data that would be used for stack blasting would still require 65 bytes of RAM and you would need code to handle different size sprites if stack blasting.

The fastest method I have come up with is:

[4+4]   8 LEAU 5+128*12,X
[3]    11 LDD #$4999
[3]    14 LDX #$9999
[5+3]  22 PSHU A,X
[4+1]  27 LEAU -128+3,U
[5+4]  36 PSHU D,X
[4+1]  41 LEAU -128+4,U
[4]    45 LDY #$9994
[5+4]  54 PSHU D,Y
[4+1]  59 LEAU -128+3,U
[5+3]  67 PSHU B,X
[4+1]  72 LEAU -128+3,U
[5+4]  81 PSHU D,Y
[4+1]  86 LEAU -128+3,U
[5+3]  94 PSHU A,X
[4+1]  99 LEAU -128+3,U
[5+3] 107 PSHU A,Y
[4+1] 112 LEAU -128+3,U
[5+3] 120 PSHU A,X
[4+1] 125 LEAU -128+4,U
[5+4] 134 PSHU D,Y
[4+1] 139 LEAU -128+4,U
[5+3] 147 PSHU B,X
[4+1] 152 LEAU -128+4,U
[5+4] 161 PSHU X,Y
[4+1] 166 LEAU -128+4,U
[5+4] 175 PSHU D,X
[4+1] 180 LEAU -128+4,U
[5+3] 188 PSHU A,X
[5]   193 RTS

This is only 193 CPU cycles and 77 bytes of RAM.

The code above added the use of PSHU in the code instead of ST instructions and used the idea of starting from the bottom of the sprite to the top (this suggestion was from Curtis L. Boyle) since the U gets changed from the PSHU command automatically this means that after the PSHU command the LEAU is less then a signed 256 byte value and is shorter.  Also the PSHU command is faster then multiple ST instructions.

This compiled/halfstack blasted sprite technique can be used especially well for any CoCo1 game too, since you have less colours which means more repeating values in the sprites.

After doing my test video shown in my previous article I’ve decided I’m going to use an different method of updating the sprites on screen.  I’m planning on using double buffering since the quick and easy method of redrawing whatever the ghosts run over and ignoring what is behind Pac Man is becoming very difficult and in the end it wasn’t looking perfect.  The main struggle was when Pac Man goes around corners there are times when it moves more then 2 pixels in each direction and if it’s not accounted for properly there was some yellow pixels left on the screen.  I’m hoping double buffering will make these trouble go away.  Although it might be a little slower but I think with the improved compiled sprite code using PSHU speed wont be an issue.

So here I go again re-writing the graphics engine…  The graphics rendering is taking a lot longer then I thought, even more work then that actual z80 to 6809 transcode tool!  But it’s been a great learning experience.

See you in the next post.

This entry was posted in CoCo Programming, Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s