Zilog z80 to Motorola 6809 Transcode – Part 021 – Compiled Sprites are faster then Stack blasting!

Hello, after writing previous blogs on how great stack blasting is, I’ve recently found out about an even faster sprite rendering method called Compiled Sprites.  A few weeks ago I was watching a CoCo youtube video and just briefly they mentioned using Compiled Sprites for game rendering.  So I googled it and there wasn’t much information on the technique.  I guess this is because most computers had built in hardware for sprites so only a few computers like the CoCo had hires graphics but no sprite hardware.  So what are Compiled Sprites?  It is a method of writing the data that needs to be put on the screen as assembly code that stores the picture data directly to the video RAM.  Can you guess what this example is?

        LEAX    7,X
        LDD     #$9999
        STD     -5,X
        STD     122,X
        LDB     #$90
        STD     124,X
        STB     -3,X
        LEAX    256,X
        LDB     #$99
        STD     -6,X
        STD     122,X
        STA     -4,X
        LDA     #$90
        STA     124,X
        LDA     #$09
        STA     -7,X
        STA     121,X
        LEAX    256,X
        LDA     #$99
        STD     -7,X
        STD     121,X
        STA     -5,X
        LDA     #$90
        STA     123,X
        LEAX    256,X
        LDA     #$99
        STD     -7,X
        STD     121,X
        LDA     #$90
        STA     123,X
        LEAX    256,X
        LDA     #$99
        STD     -7,X
        STD     122,X
        STA     -5,X
        LDA     #$90
        STA     124,X
        LDA     #$09
        STA     121,X
        LEAX    256,X
        LDA     #$99
        STD     -6,X
        STD     122,X
        LDB     #$90
        STD     124,X
        STA     -4,X
        LDA     #$09
        STA     -7,X
        LEAX    256,X
        LDD     #$9999
        STD     -5,X
        LDA     #$90
        STA     -3,X
        RTS

The above code is actually a sprite of Pac Man just like the picture below.

Screen Shot 2017-04-12 at 9.33.33 PM

It turns out that rendering data to the screen as a bunch of LDD,LDA,LDB and STD,STA,STB instructions is faster then stack blasting!  It takes more RAM to store the sprites as code but not a lot more.  From my tests I’ve taken the stack blasting 16×14 sprites take 529 cycles.  Compiled Sprites vary in size since it depends on how much detail is in the 16×14 pixel sprite.  I’ve had some as low as 263 cycles and the larger ones are still around 450 cycles.  These are incredible considering how fast stack blasting already is.  There are also some extra benefits to Compiled sprites besides speed.  You don’t have to worry about the stack pointers like you do with stack blasting.  You also get transparency since you only write the bits that you need to the screen.

I’m now in the process of converting my sprites to Compiled Sprites and then I’ll have to implement the new sprite handling into my Pac Man transcode.  This is going to be a lot of work, but in the end it will be that much better.

See you in the next post.

This entry was posted in CoCo Programming. Bookmark the permalink.

4 Responses to Zilog z80 to Motorola 6809 Transcode – Part 021 – Compiled Sprites are faster then Stack blasting!

  1. John Kowalski says:

    If you’re up for revising your sprite compiler, you could make sprites even smaller and faster by avoiding LEAX 256,X (which is 11 cycles and 4 bytes). You could use ABX (3 cycles, 1 byte) whenever B happens to hold graphics data that are highish byte values. It won’t advance X by 256, but even if you have to use more ABX opcodes than LEAX 256,X opcodes, it’s still faster. Hell, even if you manually inserted a LDB #255 before each ABX, it’s still smaller and faster.

    • nowhereman999 says:

      Hi John,

      Using ABX is a great idea, thanks for sharing your method. According to LWtools LEAX 256,X is 8 cycles and 4 bytes, either way it’s still slower then your recommendation of ABX! Currently my program looks at two rows of the sprite together and loads D with the most used value for the two rows (Pacman is using a 256 pixel screen width) then a bunch of STD +-whatever,X then does the same for the next most used value until it gets down to the least used and loads A or B when it needs to. Using your method I might be able to have D or B loaded with the largest value right at the end of the two line section (where I do the LEAX 256,X right now) and use the ABX command and a small LEAX value after to position the start of the next two rows at the correct spot. I will definitely do some testing and learning how to incorporate your ideas into my compiled sprites. I just finished all my main sprite code last week! Now I’m going to have to start again, thanks for all the extra work you just gave me. 🙂

      I also will have to look through my Pac man transcode to see if I can use ABX anywhere instead of any LEAX B,X (or swap out LEAX A,X’s for the B register and use an ABX) very cool.

      Thanks again,
      Glen

      • ABX is a great command – used it a lot in NitrOS-9. Just have to be careful in that LEAX B,X treats B as signed (-128 to +127) while ABX is unsigned. If your last directly stored pixels on a line in your compiled sprite only needs 8 bits, you can also cheat and do a LDD (with the high byte (A) as your graphics data, and your low byte (B) as the offset needed for the ABX. LDD # is faster/smaller than LDA #/LDB #.

      • nowhereman999 says:

        Hi Curits,

        Thanks for the extra info and ideas for the ABX instruction and the LDD instruction. It’s all much appreciated.

        Cheers,
        Glen

Leave a comment