How to make PMODE 4 CSM video files for the CoCo (TRS-80 Color Computer)

Hi All,

Using some awesome free tools a bash script and a little C program I wrote it’s possible to make PMODE 4 colour videos that playback at 23.3 fps with sound and have four colours – black, white,  and the two artifact colours blue and red/orange.  Below I’m going to explain how this all works and how you can make your own.  As an example of what you can expect from this video conversion I’ve uploaded two youtube videos showing the player on a CoCo 1.  The videos can be found here and here, don’t mind the mess on the floor, I usually have my CoCo 3 hooked up but wanted to show this on an actual CoCo model 1 with a 6309 CPU and 64k of RAM.

First we use a fairly recent version of FFMPEG, which is the most amazing video and audio conversion tool there is.  It will take just about any video or audio file as input and convert with filters and effects to many other formats.

First I use FFMPEG to set the frames per second to 23.3 which the CSM player requires.  It also scales the input video width and height so that the aspect ratio is perfect for the CoCo PMODE 4 screen. The format that FFMPEG outputs is 256×192 and will automatically generate black bars on the screen if needed.  It outputs the scaled video as individual still frames/images and numbers them as it produces them.  The images are stored in a folder called pics and are compress LZW – TIFF images.

Next the script uses a picture processing tool called ImageMagick.  ImageMagick is used to do the following things:

  1. Turns yellow pixels to white (ivory)
  2. Resizes the pictures to 128×192 (necessary for making artifacts)
  3. Handles picture levels (black, white and gamma controls)
  4. Normalizes the pictures as the pictures need to be fairly bright or they are hard to make out on the CoCo screen
  5. Remaps the colours of the original image to the black, white (blue, red/orange) colours.  It does this by looking at the palette of a 4 colour GIF file that represents the colours the CoCo 1 can display on a PMODE 4 screen
  6. Dithers that image to make it look like there are more colours on the screen
  7. Flips the image vertically and saves it as an uncompressed 16 colour bitmap (BMP) file.  The BMP format saves every picture upside down, so I flip it before saving it so the data in the file is ready to be used as it is.

Next we use FFMPEG again to process the audio of the source video.  At this point we use FFMPEG to convert the audio of the movie file to 1 channel, 11932 samples/second, unsigned 8 bit audio file.  FFMPEG could have been used to re-encode the video to stills and the audio at the same time but I like to do it separate so that it’s easier on the hard drive other wise you will be reading from the source video and writing both the audio and generating the stills at the same time.

At this point we use FFMPEG to take the sill images and the 8 bit unsigned audio and convert it to a test.mp4 file that can be viewed on your regular computer.  This is a very good representation of how the video will look on the CoCo.  Watching the test.mp4 you can tell if you need to make the output video brighter or not.  This can be done using the -b option with the conversion script.

Next my little C program is executed which takes the BMP stills and goes through each pixel and converts it to either a black set of bits (00) or if it’s orange (01) or blue (10) or white (11) together four times so it has a byte of data and stores that on the buffer.  It does that for the entire picture and then muxes the audio and new video data together in the format the Ed Sniders player will accept.

The last step is to join the CSM header file with the CSM PMODE 4 player code and the muxed audio/video file into one new .CSM file that is ready to be copied to an SD card and played back on the CoCo.

How do I make my own CSM videos?

You must install FFMPEG and ImageMagick on your computer.  If you are on a Mac the easiest way to install the command line tools is using homebrew found here.  Once homebrew is installed on your Mac type brew install ffmpeg and brew install imagemagick to install.  The BASH script and C program I wrote are ready to be used if you have a Mac.

It should work without issues on a Linux box and Cygwin/Mingw on a Windows box.  You will have to compile the program yourself and make sure to install FFMPEG and ImageMagick using your package manager.

Ron Klein is working on ready to go versions for Linux, RPI3 & Windows.  They will be available soon from the link below.

Ed Snider is hosting the files with his other CoCo SDC Media player software.  The Mac version is already available to use and can be found here:

https://www.mediafire.com/folder/20xt2l2k0160i/CoCo_SDC_Media_Player

In a folder called Tools for making CSM files.

uncompress the .ZIP file and using the command line in the makecsm folder type:

./makecsm.sh -h

This will give you a summary of the options available for the conversion.  Below is the help you will see:

makecsm – CoCoSDC CSM Video Maker v 1.00
Usage: [-s hh:mm:ss] [-e hh:mm:ss] [-d seconds] [-n] [-h] [-t] [-b 0.00 to 10.00] inputfile [outputfile[.CSM]]

option: -s hh:mm:ss is the time in the video to start conversion
-e hh:mm:ss is the time in the video to end conversion
-d duration in seconds
-n means no artifact colour, make a black and white video
-b x.xx sets the brightness level of the video (1.00 is default)
-h Prints this help message
-t normally a test.mp4 file is created so you can see
the resulting movie on this computer before copying it to the CoCoSDC
this option will disable the creation of a test.mp4 movie

Example: To make a movie from the source movie called mymovie.mkv
Starting at 53 seconds into the video and ending at 1 minute and 30 seconds.
The duration of the video will be 37 seconds. Brighten the video a little
and use the output filename COCOVID.CSM
Command would be: makecsm.sh -s 00:00:53 -e 00:01:30 -b 1.1 mymovie.mkv COCOVID.CSM

If no start time or end time is given then the entire video will be converted
If only the start time is given then the conversion will start at the given time
and it will convert the video until the end of the video.
If only the end time is given then the conversion will start from the beginning
of the video up to the end time given.
If no output filename is given then an output file will be created in the
current folder with the extension .CSM

The output filename must be uppercase and be a maximum of 8 characters long,
with the extension .CSM or the CoCoSDC player will not recognize it.

A few little notes on the options, you can select -d 100 (or any number of seconds for the duration) without the -s hh:mm:ss option and it will create a video from the start of the video for the number of seconds in this option (example here of 100 seconds).

You can make a black and white video without artifacts using the -n option.

One last thing to note

If your input or output video filenames have spaces then the script will probably fail.  It will be less troublesome if you move the source videos into the makecsm folder.

How to improve the video quality

The makecsm.sh script is pretty straight forward.  Other then FFMPEG creating the images at the correct size all of the image processing is done with ImageMagicks convert command.  If you look up help on ImageMagick there are tons of options.  Maybe going through these options you can find better settings to improve the output quality of the artifact colours.  It’s really hard to get yellows and greens on the CoCo screen so these should probably converted to grey or white.  The current script converts yellow to ivory which is better then converting it to red.  Feel free to tweak the command and if you come up with a really good setting please post it below in the comments.

Last little artifact problem to deal with

When the CoCo 1 or 2 is turned on their is no way to know if the even bits of a PMODE 4 screen make a blue colour or if it’s the odd bits that make the blue colour.  So I wrote a little basic program that I placed on Ed Snider’s PLAY.DSK image.  I called the program GO.BAS and when it is run it fills the screen with the red/orange artifact colour.  The program then asks if the picture is blue, if so then hit reset and start the program again.  If the screen is orange/red then you press a key and it starts Ed’s CSM player.  A copy of this program called GO.BAS is included with the script.  It will need to be copied to Ed’s PLAY.DSK image with imgtool or toolshed or similar utility.

10 CLS
20 PMODE4,1:PCLS:SCREEN1,1
30 FOR X=1 TO 255 STEP 2
40 LINE(X,0)-(X,191),PSET
50 NEXTX
60 PRINT”IF THE SCREEN TURNED BLUE THEN PRESS RESET AND RUN THE PROGRAM AGAIN”:PRINT
70 PRINT”IF THE SCREEN TURNED ORANGE/RED THEN PRESS ANY KEY TO START THE SDCM PLAYER”
80 I$=INKEY$
90 I$=INKEY$:IF I$=”” THEN 90
100 LOADM”SDCM”:EXEC&H5800

Have Fun,

Glen

Posted in CoCo Programming, Uncategorized | Leave a comment

Zilog z80 to Motorola 6809 Transcode – Part 025 – My z80 to 6809 program

To end this series on transcoding the z80 code to the 6809 I thought I should include my c program called z80_to_6809_15_Pacman.c it is what I used to help with the transcode.  It takes a z80 disassembly as input and outputs what it thinks is a compatible 6809 instruction in place.  It keeps the z80 source code to the right which makes it easier when you are manually going though the code.  It’s not a very complicated program but it get’s the job done.  The formatting of the text input file must have the correct spacing which you will have to play with if you want to use the program for your own projects.

You can find it here.

I hope these posts were helpful for anyone interested in the CoCo 3 or transcoding.

Cheers,

Glen

Posted in Uncategorized | Leave a comment

Zilog z80 to Motorola 6809 Transcode – Part 024 – PAC MAN is finally complete, if you have a CoCo 3 with 512k give it a try…

Hello, well I’ve finally completed my translation from the z80 arcade version of PAC MAN to the 6809 for the CoCo 3.  If you want to play it download version 1.01 here (Update – this version no longer includes any executable files* See note below).  This newer version moved the palette changing routine that makes the power pills flash into the Vblank IRQ which get’s rid of the a little glitch that the GIME chip has that causes a little flicking while the game is running.  This new version gets rid of that flicker.  Thanks to Nicolas Marentes for letting me know about the GIME chip glitch and also how to get around it.

The upload includes a user guide with instructions on how to copy the PACMAN.5E ROM onto the .DSK image so you can legally use the game.  This is similar to using MAME games and needing the rights to use the ROMs and play the games.

The user guide also explains what all the settings are in the option screen.

The .zip also has the 6809 assembly language source code files so others can hopefully play with and learn from.

Have fun,

Glen

* The updated version linked above previously included a compiled version of slz.c for both windows and Mac.  SLZ is used when you assemble the source code into a binary file, it compresses the binary to a file size that will still fit on the CoCo disk.  I saw a post on Facebook that said the .ZIP contained a virus.  I don’t own a windows PC so I really don’t know if there is a virus within the ZIP.  I had to use someone else’s windows machine to compile the slz.exe file so it’s possible that it had a virus.

To play it safe I removed the windows version and the MAC version.  The .ZIP no longer includes any executable files.  The original slz.c file is still included but you will have to compile it yourself to use it.  I hope this didn’t effect anyone.

Posted in Uncategorized | 3 Comments

CoCo (6809) Assembly on a modern computer

This article is a guide for anyone who is thinking about learning 6809 assembly language programming or wants to use newer tools for doing 6809 assembly for the Tandy Color Computer.  It’s not an assembly tutorial, it’s an explanation of how to use some of the modern tools to help write and debug assembly language programs for the CoCo.

I’ve gotten back into assembly programming on the CoCo about nine months ago and I found the modern tools that are available today make it easier and faster to learn assembly language programming.  It used to be a long slow process of assembling your program with EDTASM+ and saving it running it and debugging it back in the 1980’s.  Using MAME and lwasm you can assemble your program in a second and view the assembled code with the extra information about how many cycles each process is taking.  Which is vital when you want to optimize your code for the most speed or the smallest size.  LWTOOLs which includes lwasm is an amazing 6809/6309 assembler that is completely free.  It’s written by William Astle, who is on the CoCo mailing list LWTOOLS can be compiled for MAC/Linux and Windows.

My favourite emulator is MAME it’s been around for a long time emulating arcade machines and the CPU emulation has been tested in many different scenarios.  There was a branch of MAME called MESS that took the same code and used it only for computer emulation.  But for a few years now MESS is joined together with the main MAME code and now MAME includes both the Arcade emulation and the computer Emulation.  MAME is cross platform and is still being heavily developed.  MAME also has a special debug mode that let’s you step through your program and see how it is running step by step, which is a fantastic testing and learning tool. MAME can output the code it is executing to a text file when using a the trace command. MAME also has something special called watch points which allows you to setup locations in memory that will halt your program if the locations are written to or read from and even setup if they are changed to a specific value! Super useful for debugging… Anyways enough of a sales pitch, I figure you are reading this because you want to setup your assembly environment.

First you need to install LWTOOLS and MAME on your computer. Also this is already done for you if you want to use a Raspberry Pi 3 with Ron Klein’s excellent SD image. You just need to add the CoCo roms and you’re ready to go…

You can compile both MAME and LWTOOLS yourself or download ready use versions. I use a Mac myself and the quick and easiest way to get MAME and lwtools and tons of other utilities installed are by using Homebrew

Once brew is installed on your system it’s as simple as typing in these two commands:

$ brew install mame

$ brew install lwtools

You can probably find similar easy installs of both programs on linux using apt-get or similar. I’m sure there are tons of ways to get these programs on windows machines. Also for windows you might want to use cygwin which gives you a unix like environment

Once they are both installed you should create a directory where you will keep all your 6809 assembly source files. Let’s call it CoCoAssembly this same folder will be where you will assemble your program and run mame from. In this directory you will need a subfolder called roms with the coco roms of the different cocos you want to emulate/test your code on.

Here is a list of all the CoCo roms that can/should go into the roms folder (don’t ask me where to get them):

bas12.rom
bas13.rom
coco2b.zip
coco2.zip
coco3dw1.zip
coco3_hdb1.zip
coco3h.zip
coco3p.zip
coco3.zip
cocoe.zip
coco_fdc_v11.zip
coco_fdc.zip
coco.zip
disk10.rom
disk11.rom
extbas11.rom
hdbdw3bc3.rom
hdbdw3bck.rom
hdbdw3becker.rom
hdbdw3cc2.rom
HDBSDC.ROM
mc10.zip
RGBDOS2HD.ROM
yados.rom

In your CoCoAssembly folder you should see a sub folder called roms where you have the above roms copied. From this CoCoAssembly folder type the following to test if your MAME is installed properly.

mame coco3 -window

Or to set the uimodekey to F12 use this to start MAME:

mame coco3 -window -uimodekey F12

To exit the MAME emulator hit the keyboard emulation mode key on Mac it’s the delete key (left of the end key). Some laptops don’t have the other delete key so use the F12 command line shown above.  On Linux and windows the key is ScrLk. You want to set this mode to partial then press the Esc key to exit.

The complete installation of MAME includes some nice tools. The most useful for us is called imgtool which is used to create and manipulate our CoCo disk image files or .dsk files. If you don’t like imgtool you can use another image handling tool called toolshed I will be using imgtool below.

Create a new blank .DSK disk image that we can copy our program to with the following command:

imgtool create coco_jvc_rsdos Disk1.dsk

Now that we have a blank disk let’s assemble a program and copy it on this image file, in your favourite editor type in the following short 6809 assembly program:

        ORG $4000
Start:
        PSHS   A,B
        LDA    #'H
        LDB    #'I
        STD    $500
        PULS   A,B,PC
        END    Start

Save the program as mycode.asm

This is the command I use to from LWTOOLS to assemble my 6809 source code:

lwasm -9bl -p cd -oNEW.BIN mycode.asm

You should see:

            ( mycode.asm):00001 ORG $4000
4000        ( mycode.asm):00002 Start:
4000 3406   ( mycode.asm):00003 [5+2] PSHS A,B
4002 8648   ( mycode.asm):00004 [2] LDA #'H
4004 C649   ( mycode.asm):00005 [2] LDB #'I
4006 FD0500 ( mycode.asm):00006 [6] STD $500
4009 3586   ( mycode.asm):00007 [5+4] PULS A,B,PC
            ( mycode.asm):00008 END Start

In the lwasm output there are numbers in the square brackets, these numbers are the CPU cycles used for each line of code.  This can be helpful if you want to figure out how to optimize your assembly code for the max speed or best size as there are many tricks to speeding up code at the cost of size and vice versa.

If you want to capture the assembly output to a file called listing.txt use this command. It’s useful to keep the output file to refer back to when debugging your code since it will have the addresses of the code in memory of the instructions…

lwasm -9bl -p cd -oNEW.BIN mycode.asm > listing.txt

The above command options tells lwasm to generate our output code as an RSDOS “LOADM” compatible 6809 program.

Once you have your program assembled OK as NEW.BIN you have to transfer it to the .DSK image so it can be run with the emulator we use imgtool for this.

imgtool put coco_jvc_rsdos Disk1.dsk NEW.BIN TEST.BIN

The above command tells imgtool to put or copy the file NEW.BIN into the disk image file called Disk1.dsk use the CoCo RSDOS format of coco_jvc_rsdos and save the file on the disk with the name TEST.BIN

Another useful feature of imgtool is to delete files from an image to delete the file TEST.BIN on the .dsk file use the following:

imgtool del coco_jvc_rsdos Disk1.dsk TEST.BIN

Imgtool also has many more features, type the imgtool without any options to see all the features.

Let’s test and debug our program using MAME:

mame coco3 -window -debug -flop1 Disk1.dsk

This starts mame up in it’s debugger mode and you will see the following window, or similar with windows and linux.

It’s important to note that the mame debugger always uses hex values for its input and output as a default.

You can see the pink highlighted line on address 8C1B, this is the line that is about to be executed. This is where the CoCo 3 first starts when you power on your computer. On the top left is the cycles (counts the CPU cycles), beamx which is where the beam of the picture tube is currently being drawn in the x direction. Beamy is which row is being drawn on the screen at this moment it time. Flags shows the flags that are currently set in the CC (condition code) register of the 6809 CPU.

  • PC is the program counter and shows us the address where your instructions will be executed next.
  • S is the current stack pointer location
  • CC again is the condition code register but this time shown as a hex number.
  • DP is the Direct Page value
  • A is the A accumulators value
  • B is the B accumulators value
  • D is the A & B accumulators value together as a 16 bit value
  • X is the X registers value
  • Y is the Y registers value
  • U is the U registers value

At the bottom of this window is a command line area where you can type commands for the debugger to execute such as watchpoints or using the trace function another cool feature of the debugger. You can get a lot of help from the debugger itself by typing the word help. Let’s setup a breakpoint at $4000 which is the address where our little test program is going to be loaded and executed. In the line type the following:

wp 4000,1,w

This command sets a watchpoint at address $4000 that is 1 byte long and will stop the the execution of processor when there is a write operation at this address. We could have made the watchpoint look at many bytes and check for write and read with the wr option or just read with the r option.

Now press F5 to make the debugger continue with execution. You’re thinking why did it stop? Disk Basic hasn’t even started yet? The reason it stopped is because Disk Basic is setting up the memory and it did a write instruction at address $4000. In the debug window it shows a message Stopped at watchpoint 1 writing byte to 00004000 (PC=C033) (data=32)

This is telling us that code at address C033 (Disk ROM address) wrote the byte 32 to address $4000 and since our watchpoint is set to stop code at this point it stopped so that we can now look at the code. We don’t really want to get into all the things RSDOS is doing as it boots up so let’s hit F5 again.

Now you should see the familiar RSDOS OK prompt. So let’s make sure our disk image is being used by mame. Type the DIR command in RSDOS and you should see the TEST.BIN as per the picture below. If you got this far things are looking good.

Let’s load our program type LOADM”TEST” and hit Enter

Our debugger stopped the code again as RSDOS loaded your program into memory at address $4000. That’s good, hit F5 once again to let it finish the loadm command.

The next thing we want to do is setup a break point which will stop program execution when the program counter gets to a certain address. In our case our program is going to be executed at address $4000 so in the debug command line type the following command:

bp 4000

Next press F5 and in the RSDOS window type:

EXEC

This is where the fun begins, the breakpoint stops execution and you can now step through the code line by line watching the registers each step of the way. You can also pull up a memory window with command d or probably control d from linux/windows. Or goto the Debug menu option at the top of the screen and select Memory window. In the Memory Window type 400, which is $400. This will show us a hex view of the text window for the CoCo.

Our program is going to write the word “HI” in the middle of the screen and you can see this in the Memory window as we step through the code. Click on the debug window and hit enter to step forward one line, as you do the S stack pointer will decrease by two bytes as it stores the A and B values in the stack memory space. Hit Enter again and the LDA #$48 instruction is executed and the A accumulator will change to show the value 48. Hit enter again and the LDB #$49 will load the B accumulator with the value 49. You can now see D’s value is now 4849. Hit enter again and you can see the value at address 0500 in the memory window has changed to 48 49. The RSDOS screen hasn’t changed yet since time is frozen when we are debugging and the beam that refreshes our screen hasn’t moved much at all in the time it takes for the 6809 to execute the few instructions we have in our program. We can now press F5 again and the PULS A,B,PC command will restore our accumulaotrs back to what they were before execution and return our program execution back to RSDOS and you should then see the screen refresh and show the HI in the left side of the middle of the screen. I hope you get the idea how the debugger works.

Another powerful feature of the watchpoint command is you can get it to stop execution only if the value of a certain RAM location changes to a specific value. For example (shrunk to fit on one line):

wp ffa0,10,w,{wpdata==0x0b},{printf “write to MMU %04X, value %02X @ %02X\n”,wpaddr,wpdata,pc; g}

This command tells the debugger to watch $FFa0 to FFB0 for a write operation. If one occurs check if the value is a $0b and if so write to the debug window the message.

An example of the output might be where 200D is the program counter address when it made FFA2 the value 0B

write to MMU FFA2, value of 0B @ 200D

One last cool feature I want to show is the trace feature. The trace command follows the execution of the program and saves the disassembled instructions as a text file to be analyzed. You can set it up so it also saves the register data at those points in your file too. Here is how I use it, from the debug window we still have our watchpoint activated. But I’ll show you how to deactivate the watchpoints and breakpoints first. From the Debug window click the bottom down arrow beside the command line bar and click on Break this stops execution and allows you to use the debug features again just like when a breakpoint or watchpoint has been triggered. From the Debug menu select New (Break|Watch)points Window

The window defaults to show the breakpoints but if you click on the top bar you can select ALL Breakpoints or ALL Watchpoints as below:

Select each view and click on the lines and you will see the X on the left turn into a red 0 to indicate it is disabled.

In this example I’m going to set a breakpoint at $4000 again manually and another breakpoint at the end of our program. This is so the trace output will be short, as these files can get huge if you let them run for a few seconds. Depending on the speed of your computer.

From the debug window command line type the following two lines to setup the two new breakpoints

bp 4000

bp 4009

Hit F5 and go to the RSDOS window and type EXEC again, after the BP stops and the debug shows line 4000 we will turn on the trace function by using the following command all on one line (shrunk to fit on one line):

trace output.tr.txt,0,,{tracelog “A=%02X,B=%02X,X=%02X,Y=%02X,U=%02X,S=%02X,CC=%02X “,a,b,x,y,u,s,cc}

Then hit F5 to continue the program execution, which will stop at $4009 where our last breakpoint was set. Turn off the trace function with the command in the debug window

trace off

Hit F5 again to get the RSDOS prompt again. When you want to close MAME once again hit the Emulation key mode key and the Esc key.

Once you are out of MAME you can view the trace file in a text editor and you should see the following:

A=00,B=44,X=ABAB,Y=AAF1,U=2E0,S=7F32,CC=84 4002: LDA #$48
A=48,B=44,X=ABAB,Y=AAF1,U=2E0,S=7F32,CC=80 4004: LDB #$49
A=48,B=49,X=ABAB,Y=AAF1,U=2E0,S=7F32,CC=80 4006: STD $0500
A=48,B=49,X=ABAB,Y=AAF1,U=2E0,S=7F32,CC=80 4009: PULS A,B,PC

This shows the values of the accumulators and registers on each line of code.

Another feature of the debugger you can also get it to run until an IRQ is triggered by hitting F7 as shown here:

Another helpful thing you can do while using the MAME debug mode is you can change the contents of any accumulator/register

For example when you stop execution you can type in the debug window’s command line:

pc=1000

would change program counter (pc) will be changed to address $1000 and the program would continue from address $1000 if you hit F5 or step through the code.

a=94

Changes the A accumulators value to $94.  You get the idea…

If you want to see the cycle counts in your code listing you can add these lines to your assembly source code:

        opt     c
        opt     ct
        opt     cd
        opt     cc

The code listing will output the cycle counts from the place you inserted the above special options.  Anytime you want to reset the counts you can just insert the following:

        opt     cd
        opt     cc

Here is a little output code so you can see how to use it in your source code and the actual cycle counts in the listing, just to the left of the 6809 instructions.  This is some example code showing different ways to clear data in memory (or the screen).  It’s from another article I’m working on about assembly optimization.

                      (       mycode.asm):00001                         opt     c
                      (       mycode.asm):00002                         opt     ct
                      (       mycode.asm):00003                 
                      (       mycode.asm):00004                         ORG     $4000
4000                  (       mycode.asm):00005                 Start:
                      (       mycode.asm):00006                         opt     cd
                      (       mycode.asm):00007                         opt     cc
                      (       mycode.asm):00008                 * Slow way
4000 8E4000           (       mycode.asm):00009 [3]     3               LDX     #$4000
4003 CE0000           (       mycode.asm):00010 [3]     6               LDU     #$0000
                      (       mycode.asm):00011                         opt     cd
                      (       mycode.asm):00012                         opt     cc
                      (       mycode.asm):00013                 * This loop is 15 cycles to update two bytes
                      (       mycode.asm):00014                 * We have to do this loop $2000 / 2 bytes each pass = $1000 times
                      (       mycode.asm):00015                 * 15 cycles * $1000 or 4096 = 61,440 cpu cycles
4006 EF81             (       mycode.asm):00016 [5+3]   8       !       STU     ,X++
4008 8C6000           (       mycode.asm):00017 [4]     12              CMPX    #$4000+$2000
400B 26F9             (       mycode.asm):00018 [3]     15              BNE     <
                      (       mycode.asm):00019                 
                      (       mycode.asm):00020                         opt     cd
                      (       mycode.asm):00021                         opt     cc
                      (       mycode.asm):00022                 * Faster way
400D 8E4000           (       mycode.asm):00023 [3]     3               LDX     #$4000
4010 CE0000           (       mycode.asm):00024 [3]     6               LDU     #$0000
4013 CC2000           (       mycode.asm):00025 [3]     9               LDD     #$2000
                      (       mycode.asm):00026                         opt     cd
                      (       mycode.asm):00027                         opt     cc
                      (       mycode.asm):00028                 * This loop is mostly 13 cycles sometimes 18 cycles every 256 bytes
                      (       mycode.asm):00029                 * $2000 / $100 = $20
                      (       mycode.asm):00030                 * $20 / 2 = $10  (half because we write 2 bytes per cycle)
                      (       mycode.asm):00031                 * $2000 - $20 = $1FE0
                      (       mycode.asm):00032                 * $1FE0 / 2 = $FF0  (half because we write 2 bytes per cycle)
                      (       mycode.asm):00033                 * 13 cycles * $FF0 + 18 cycles * $10 = $CF30 + $120 = $D050 = 53,328 cpu cycles
4016 EF81             (       mycode.asm):00034 [5+3]   8       !       STU     ,X++
4018 5A               (       mycode.asm):00035 [2]     10              DECB
4019 26FB             (       mycode.asm):00036 [3]     13              BNE     <
401B 4A               (       mycode.asm):00037 [2]     15              DECA
401C 26F8             (       mycode.asm):00038 [3]     18              BNE     <
                      (       mycode.asm):00098
                      (       mycode.asm):00099                 * Fastest method is to use unfolded loops
                      (       mycode.asm):00100                 * and use the U Stack pointer instead of a ST instruction
4073 CC0000           (       mycode.asm):00101 [3]     136             LDD     #$0000
4076 8E0000           (       mycode.asm):00102 [3]     139             LDX     #$0000
4079 3184             (       mycode.asm):00103 [4+0]   143             LEAY    ,X
407B CE6000           (       mycode.asm):00104 [3]     146             LDU     #$4000+$2000
                      (       mycode.asm):00105                         opt     cd
                      (       mycode.asm):00106                         opt     cc
                      (       mycode.asm):00107                 * This loop is 70 cycles to write 32 bytes
                      (       mycode.asm):00108                 * We cycle through the loop 256 times so the calculation is
                      (       mycode.asm):00109                 * 256 * 70 = 17,920 CPU Cycles
407E 3636             (       mycode.asm):00110 [5+6]   11      !       PSHU    D,X,Y
4080 3636             (       mycode.asm):00111 [5+6]   22              PSHU    D,X,Y
4082 3636             (       mycode.asm):00112 [5+6]   33              PSHU    D,X,Y
4084 3636             (       mycode.asm):00113 [5+6]   44              PSHU    D,X,Y
4086 3636             (       mycode.asm):00114 [5+6]   55              PSHU    D,X,Y
4088 3606             (       mycode.asm):00115 [5+2]   62              PSHU    D
408A 11834000         (       mycode.asm):00116 [5]     67              CMPU    #$4000
408E 22EE             (       mycode.asm):00117 [3]     70              BHI     <
                      (       mycode.asm):00118                 
                      (       mycode.asm):00119                         END     Start

I should also point out a nice feature of lwasm is the use of greater than > and less than < pointers.  You don’t need a label for every branch instruction.  In the listing above you can see the use of  “BHI    <” that tells the assembler to branch if higher back in the source until the first “!” is found.  You can also branch forward with a command like “BNE    >” which will tell the assembler to branch if not equal to the next “!” found below in your source code.

I should also point out there is a special version of MAME on GitHub that has some special enhancements for the CoCo that might come in handy.  You can read up about it and it’s features here.

I hope this info helps others to get the most out using MAME to learn assembly language programming.

Posted in CoCo Programming, Emulation | 2 Comments

Zilog z80 to Motorola 6809 Transcode – Part 023 – Optimized sprite rendering, combining Compiled sprites with Stack Blasting

Talking with other CoCo users about optimizing the sprite rendering on the CoCo I’ve figured out the best way to render the sprites for Pac Man on the CoCo3.  This article will summarize what I’ve learned and then explain what I will use for Pac Man.

I got some great tips from  Richard Goedeken’s Game Engine for the CoCo 3 called “Dynosprite” it is available on github.  Going through his source code I found his code not only does straight LDA, LDB, LDD commands but also checks to see if the following instructions can be used with the A or B accumulators:

CLR
COM
NEG
INC
DEC

This saves another byte over a straight LD instruction, but the speed is the same.  Every byte counts!

For the examples below I’m using a Pac Man sprite facing the right with his mouth wide open.  The current palette uses 9 as yellow and 4 as black, two pixels per byte.

Previously the fastest method I thought for doing compiled sprites was the following:

[4+1]   5 LEAU 5,X
[3]     8 LDD #$4999
[3]    11 LDX #$9999
[4+1]  16 STA -3,U
[5+1]  22 STX -2,U
[5+1]  28 STD -4+128,U
[5+1]  34 STX -2+128,U
[4+4]  42 LEAU 256,U
[4]    46 LDY #$9994
[5+1]  52 STX -4,U
[6+1]  59 STY -2,U
[4+1]  64 STB -4+128,U
[5+1]  70 STX -3+128,U
[4+4]  78 LEAU 256,U
[5+1]  84 STD -5,U
[6+1]  91 STY -3,U
[4+1]  96 STA -5+128,U
[5+1] 102 STX -4+128,U
[4+4] 110 LEAU 256,U
[4+1] 115 STA -5,U
[6+1] 122 STY -4,U
[4+1] 127 STA -5+128,U
[5+1] 133 STX -4+128,U
[4+4] 141 LEAU 256,U
[5+1] 147 STD -5,U
[6+1] 154 STY -3,U
[4+1] 159 STB -4+128,U
[5+1] 165 STX -3+128,U
[4+4] 173 LEAU 256+128,U
[5+4] 182 STX -4-128,U
[6+4] 192 STY -2-128,U
[5+1] 198 STD -4,U
[5+1] 204 STX -2,U
[4+1] 209 STA -3+128,U
[5+1] 215 STX -2+128,U
[5]   220 RTS

This method is still faster then Stack Blasting and this method takes 220 CPU cycles to draw on screen and 106 bytes of RAM.  Which is pretty good since a full 16×16 sprite would take 128 bytes.  Another benefit of compiled sprites is you aren’t stuck to writing a certain block size to the screen.  This actual sprite is really only 10 pixels x 13 rows.  As bitmap data that would be used for stack blasting would still require 65 bytes of RAM and you would need code to handle different size sprites if stack blasting.

The fastest method I have come up with is:

[4+4]   8 LEAU 5+128*12,X
[3]    11 LDD #$4999
[3]    14 LDX #$9999
[5+3]  22 PSHU A,X
[4+1]  27 LEAU -128+3,U
[5+4]  36 PSHU D,X
[4+1]  41 LEAU -128+4,U
[4]    45 LDY #$9994
[5+4]  54 PSHU D,Y
[4+1]  59 LEAU -128+3,U
[5+3]  67 PSHU B,X
[4+1]  72 LEAU -128+3,U
[5+4]  81 PSHU D,Y
[4+1]  86 LEAU -128+3,U
[5+3]  94 PSHU A,X
[4+1]  99 LEAU -128+3,U
[5+3] 107 PSHU A,Y
[4+1] 112 LEAU -128+3,U
[5+3] 120 PSHU A,X
[4+1] 125 LEAU -128+4,U
[5+4] 134 PSHU D,Y
[4+1] 139 LEAU -128+4,U
[5+3] 147 PSHU B,X
[4+1] 152 LEAU -128+4,U
[5+4] 161 PSHU X,Y
[4+1] 166 LEAU -128+4,U
[5+4] 175 PSHU D,X
[4+1] 180 LEAU -128+4,U
[5+3] 188 PSHU A,X
[5]   193 RTS

This is only 193 CPU cycles and 77 bytes of RAM.

The code above added the use of PSHU in the code instead of ST instructions and used the idea of starting from the bottom of the sprite to the top (this suggestion was from Curtis L. Boyle) since the U gets changed from the PSHU command automatically this means that after the PSHU command the LEAU is less then a signed 256 byte value and is shorter.  Also the PSHU command is faster then multiple ST instructions.

This compiled/halfstack blasted sprite technique can be used especially well for any CoCo1 game too, since you have less colours which means more repeating values in the sprites.

After doing my test video shown in my previous article I’ve decided I’m going to use an different method of updating the sprites on screen.  I’m planning on using double buffering since the quick and easy method of redrawing whatever the ghosts run over and ignoring what is behind Pac Man is becoming very difficult and in the end it wasn’t looking perfect.  The main struggle was when Pac Man goes around corners there are times when it moves more then 2 pixels in each direction and if it’s not accounted for properly there was some yellow pixels left on the screen.  I’m hoping double buffering will make these trouble go away.  Although it might be a little slower but I think with the improved compiled sprite code using PSHU speed wont be an issue.

So here I go again re-writing the graphics engine…  The graphics rendering is taking a lot longer then I thought, even more work then that actual z80 to 6809 transcode tool!  But it’s been a great learning experience.

See you in the next post.

Posted in CoCo Programming, Uncategorized | Leave a comment

Zilog z80 to Motorola 6809 Transcode – Part 022 – Quick and dirty Speed Test – Five Sprites and 2 audio samples at the same time

I decided to put the Pac Man Sprite code and audio code together and see how fast the Pac Man transcode could be drawing five compiled sprites and playing back two separate audio samples at the same time.  There’s still a lot of work to do at this point.  The graphics are still leaving junk on the screen and the cut scenes still need to be done.  There are still some audio problems…

But I’m very happy with the speed the game is currently playing.  I’m quite sure I can speed the sprites up a little more, whether it will speed up the game or not I don’t know since it’s all tied to the IRQ hitting 60 times a second.

I thought I’d share a short 2 minute video, showing Pac Man running on the CoCo 3 for anyone who has been following this blog to see where I’m at.  The quality of my CoCo 3 monitor is pretty bad but you can at least see the speed the game is playing at this point in time.  I thought it would be best to show it on a real CoCo 3 monitor rather then from MAME.

The video can be found on youtube here: https://youtu.be/zbmk08UkA3E

See you in the next post…

Glen

Posted in CoCo Programming | Leave a comment

Zilog z80 to Motorola 6809 Transcode – Part 021 – Compiled Sprites are faster then Stack blasting!

Hello, after writing previous blogs on how great stack blasting is, I’ve recently found out about an even faster sprite rendering method called Compiled Sprites.  A few weeks ago I was watching a CoCo youtube video and just briefly they mentioned using Compiled Sprites for game rendering.  So I googled it and there wasn’t much information on the technique.  I guess this is because most computers had built in hardware for sprites so only a few computers like the CoCo had hires graphics but no sprite hardware.  So what are Compiled Sprites?  It is a method of writing the data that needs to be put on the screen as assembly code that stores the picture data directly to the video RAM.  Can you guess what this example is?

        LEAX    7,X
        LDD     #$9999
        STD     -5,X
        STD     122,X
        LDB     #$90
        STD     124,X
        STB     -3,X
        LEAX    256,X
        LDB     #$99
        STD     -6,X
        STD     122,X
        STA     -4,X
        LDA     #$90
        STA     124,X
        LDA     #$09
        STA     -7,X
        STA     121,X
        LEAX    256,X
        LDA     #$99
        STD     -7,X
        STD     121,X
        STA     -5,X
        LDA     #$90
        STA     123,X
        LEAX    256,X
        LDA     #$99
        STD     -7,X
        STD     121,X
        LDA     #$90
        STA     123,X
        LEAX    256,X
        LDA     #$99
        STD     -7,X
        STD     122,X
        STA     -5,X
        LDA     #$90
        STA     124,X
        LDA     #$09
        STA     121,X
        LEAX    256,X
        LDA     #$99
        STD     -6,X
        STD     122,X
        LDB     #$90
        STD     124,X
        STA     -4,X
        LDA     #$09
        STA     -7,X
        LEAX    256,X
        LDD     #$9999
        STD     -5,X
        LDA     #$90
        STA     -3,X
        RTS

The above code is actually a sprite of Pac Man just like the picture below.

Screen Shot 2017-04-12 at 9.33.33 PM

It turns out that rendering data to the screen as a bunch of LDD,LDA,LDB and STD,STA,STB instructions is faster then stack blasting!  It takes more RAM to store the sprites as code but not a lot more.  From my tests I’ve taken the stack blasting 16×14 sprites take 529 cycles.  Compiled Sprites vary in size since it depends on how much detail is in the 16×14 pixel sprite.  I’ve had some as low as 263 cycles and the larger ones are still around 450 cycles.  These are incredible considering how fast stack blasting already is.  There are also some extra benefits to Compiled sprites besides speed.  You don’t have to worry about the stack pointers like you do with stack blasting.  You also get transparency since you only write the bits that you need to the screen.

I’m now in the process of converting my sprites to Compiled Sprites and then I’ll have to implement the new sprite handling into my Pac Man transcode.  This is going to be a lot of work, but in the end it will be that much better.

See you in the next post.

Posted in CoCo Programming | 4 Comments