CoCo (6809) Assembly on a modern computer

This article is a guide for anyone who is thinking about learning 6809 assembly language programming or wants to use newer tools for doing 6809 assembly for the Tandy Color Computer.  It’s not an assembly tutorial, it’s an explanation of how to use some of the modern tools to help write and debug assembly language programs for the CoCo.

I’ve gotten back into assembly programming on the CoCo about nine months ago and I found the modern tools that are available today make it easier and faster to learn assembly language programming.  It used to be a long slow process of assembling your program with EDTASM+ and saving it running it a debugging it back in the 1980’s.  Using MAME and lwasm you can assemble your program in a second and view the assembled code withe the extra information about how many cycles each process is taking.  Which is vital when you want to optimize your code for the most speed or the smallest size.  LWTOOLs which includes lwasm is an amazing 6809/6309 assembler that is completely free.  It’s written by William Astle, who is on the CoCo mailing list LWTOOLS can be compiled for MAC/Linux and Windows.

My favourite emulator is MAME it’s been around for a long time emulating arcade machines and the CPU emulation has been tested in many different scenarios.  There was a branch of MAME called MESS that took the same code and used it only for computer emulation.  But for a few years now MESS is joined together with the main MAME code and now MAME includes both the Arcade emulation and the computer Emulation.  MAME is cross platform and is still being heavily developed.  MAME also has a special debug mode that let’s you step through your program and see how it is running step by step, which is a fantastic testing and learning tool. MAME can output the code it is executing to a text file when using a the trace command. MAME also has something special called watch points which allows you to setup locations in memory that will halt your program if the locations are written to or read from and even setup if they are changed to a specific value! Super useful for debugging… Anyways enough of a sales pitch, I figure you are reading this because you want to setup your assembly environment.

First you need to install LWTOOLS and MAME on your computer. Also this is already done for you if you want to use a Raspberry Pi 3 with Ron Klein’s excellent SD image. You just need to add the CoCo roms and you’re ready to go…

You can compile both MAME and LWTOOLS yourself or download ready use versions. I use a Mac myself and the quick and easiest way to get MAME and lwtools and tons of other utilities installed are by using Homebrew

Once brew is installed on your system it’s as simple as typing in these two commands:

$ brew install mame

$ brew install lwtools

You can probably find similar easy installs of both programs on linux using apt-get or similar. I’m sure there are tons of ways to get these programs on windows machines. Also for windows you might want to use cygwin which gives you a unix like environment

Once they are both installed you should create a directory where you will keep all your 6809 assembly source files. Let’s call it CoCoAssembly this same folder will be where you will assemble your program and run mame from. In this directory you will need a subfolder called roms with the coco roms of the different cocos you want to emulate/test your code on.

Here is a list of all the CoCo roms that can/should go into the roms folder (don’t ask me where to get them):

bas12.rom
bas13.rom
coco2b.zip
coco2.zip
coco3dw1.zip
coco3_hdb1.zip
coco3h.zip
coco3p.zip
coco3.zip
cocoe.zip
coco_fdc_v11.zip
coco_fdc.zip
coco.zip
disk10.rom
disk11.rom
extbas11.rom
hdbdw3bc3.rom
hdbdw3bck.rom
hdbdw3becker.rom
hdbdw3cc2.rom
HDBSDC.ROM
mc10.zip
RGBDOS2HD.ROM
yados.rom

In your CoCoAssembly folder you should see a sub folder called roms where you have the above roms copied. From this CoCoAssembly folder type the following to test if your MAME is installed properly.

mame coco3 -window

Or to set the uimodekey to F12 use this to start MAME:

mame coco3 -window -uimodekey F12

To exit the MAME emulator hit the keyboard emulation mode key on Mac it’s the delete key (left of the end key). Some laptops don’t have the other delete key so use the F12 command line shown above.  On Linux and windows the key is ScrLk. You want to set this mode to partial then press the Esc key to exit.

The complete installation of MAME includes some nice tools. The most useful for us is called imgtool which is used to create and manipulate our CoCo disk image files or .dsk files. If you don’t like imgtool you can use another image handling tool called toolshed I will be using imgtool below.

Create a new blank .DSK disk image that we can copy our program to with the following command:

imgtool create coco_jvc_rsdos Disk1.dsk

Now that we have a blank disk let’s assemble a program and copy it on this image file, in your favourite editor type in the following short 6809 assembly program:

        ORG $4000
Start:
        PSHS   A,B
        LDA    #'H
        LDB    #'I
        STD    $500
        PULS   A,B,PC
        END    Start

Save the program as mycode.asm

This is the command I use to from LWTOOLS to assemble my 6809 source code:

lwasm -9bl -p cd -oNEW.BIN mycode.asm

You should see:

            ( mycode.asm):00001 ORG $4000
4000        ( mycode.asm):00002 Start:
4000 3406   ( mycode.asm):00003 [5+2] PSHS A,B
4002 8648   ( mycode.asm):00004 [2] LDA #'H
4004 C649   ( mycode.asm):00005 [2] LDB #'I
4006 FD0500 ( mycode.asm):00006 [6] STD $500
4009 3586   ( mycode.asm):00007 [5+4] PULS A,B,PC
            ( mycode.asm):00008 END Start

In the lwasm output there are numbers in the square brackets, these numbers are the CPU cycles used for each line of code.  This can be helpful if you want to figure out how to optimize your assembly code for the max speed or best size as there are many tricks to speeding up code at the cost of size and vice versa.

If you want to capture the assembly output to a file called listing.txt use this command. It’s useful to keep the output file to refer back to when debugging your code since it will have the addresses of the code in memory of the instructions…

lwasm -9bl -p cd -oNEW.BIN mycode.asm > listing.txt

The above command options tells lwasm to generate our output code as an RSDOS “LOADM” compatible 6809 program.

Once you have your program assembled OK as NEW.BIN you have to transfer it to the .DSK image so it can be run with the emulator we use imgtool for this.

imgtool put coco_jvc_rsdos Disk1.dsk NEW.BIN TEST.BIN

The above command tells imgtool to put or copy the file NEW.BIN into the disk image file called Disk1.dsk use the CoCo RSDOS format of coco_jvc_rsdos and save the file on the disk with the name TEST.BIN

Another useful feature of imgtool is to delete files from an image to delete the file TEST.BIN on the .dsk file use the following:

imgtool del coco_jvc_rsdos Disk1.dsk TEST.BIN

Imgtool also has many more features, type the imgtool without any options to see all the features.

Let’s test and debug our program using MAME:

mame coco3 -window -debug -flop1 Disk1.dsk

This starts mame up in it’s debugger mode and you will see the following window, or similar with windows and linux.

It’s important to note that the mame debugger always uses hex values for its input and output as a default.

You can see the pink highlighted line on address 8C1B, this is the line that is about to be executed. This is where the CoCo 3 first starts when you power on your computer. On the top left is the cycles (counts the CPU cycles), beamx which is where the beam of the picture tube is currently being drawn in the x direction. Beamy is which row is being drawn on the screen at this moment it time. Flags shows the flags that are currently set in the CC (condition code) register of the 6809 CPU.

  • PC is the program counter and shows us the address where your instructions will be executed next.
  • S is the current stack pointer location
  • CC again is the condition code register but this time shown as a hex number.
  • DP is the Direct Page value
  • A is the A accumulators value
  • B is the B accumulators value
  • D is the A & B accumulators value together as a 16 bit value
  • X is the X registers value
  • Y is the Y registers value
  • U is the U registers value

At the bottom of this window is a command line area where you can type commands for the debugger to execute such as watchpoints or using the trace function another cool feature of the debugger. You can get a lot of help from the debugger itself by typing the word help. Let’s setup a breakpoint at $4000 which is the address where our little test program is going to be loaded and executed. In the line type the following:

wp 4000,1,w

This command sets a watchpoint at address $4000 that is 1 byte long and will stop the the execution of processor when there is a write operation at this address. We could have made the watchpoint look at many bytes and check for write and read with the wr option or just read with the r option.

Now press F5 to make the debugger continue with execution. You’re thinking why did it stop? Disk Basic hasn’t even started yet? The reason it stopped is because Disk Basic is setting up the memory and it did a write instruction at address $4000. In the debug window it shows a message Stopped at watchpoint 1 writing byte to 00004000 (PC=C033) (data=32)

This is telling us that code at address C033 (Disk ROM address) wrote the byte 32 to address $4000 and since our watchpoint is set to stop code at this point it stopped so that we can now look at the code. We don’t really want to get into all the things RSDOS is doing as it boots up so let’s hit F5 again.

Now you should see the familiar RSDOS OK prompt. So let’s make sure our disk image is being used by mame. Type the DIR command in RSDOS and you should see the TEST.BIN as per the picture below. If you got this far things are looking good.

Let’s load our program type LOADM”TEST” and hit Enter

Our debugger stopped the code again as RSDOS loaded your program into memory at address $4000. That’s good, hit F5 once again to let it finish the loadm command.

The next thing we want to do is setup a break point which will stop program execution when the program counter gets to a certain address. In our case our program is going to be executed at address $4000 so in the debug command line type the following command:

bp 4000

Next press F5 and in the RSDOS window type:

EXEC

This is where the fun begins, the breakpoint stops execution and you can now step through the code line by line watching the registers each step of the way. You can also pull up a memory window with command d or probably control d from linux/windows. Or goto the Debug menu option at the top of the screen and select Memory window. In the Memory Window type 400, which is $400. This will show us a hex view of the text window for the CoCo.

Our program is going to write the word “HI” in the middle of the screen and you can see this in the Memory window as we step through the code. Click on the debug window and hit enter to step forward one line, as you do the S stack pointer will decrease by two bytes as it stores the A and B values in the stack memory space. Hit Enter again and the LDA #$48 instruction is executed and the A accumulator will change to show the value 48. Hit enter again and the LDB #$49 will load the B accumulator with the value 49. You can now see D’s value is now 4849. Hit enter again and you can see the value at address 0500 in the memory window has changed to 48 49. The RSDOS screen hasn’t changed yet since time is frozen when we are debugging and the beam that refreshes our screen hasn’t moved much at all in the time it takes for the 6809 to execute the few instructions we have in our program. We can now press F5 again and the PULS A,B,PC command will restore our accumulaotrs back to what they were before execution and return our program execution back to RSDOS and you should then see the screen refresh and show the HI in the left side of the middle of the screen. I hope you get the idea how the debugger works.

Another powerful feature of the watchpoint command is you can get it to stop execution only if the value of a certain RAM location changes to a specific value. For example (shrunk to fit on one line):

wp ffa0,10,w,{wpdata==0x0b},{printf “write to MMU %04X, value %02X @ %02X\n”,wpaddr,wpdata,pc; g}

This command tells the debugger to watch $FFa0 to FFB0 for a write operation. If one occurs check if the value is a $0b and if so write to the debug window the message.

An example of the output might be where 200D is the program counter address when it made FFA2 the value 0B

write to MMU FFA2, value of 0B @ 200D

One last cool feature I want to show is the trace feature. The trace command follows the execution of the program and saves the disassembled instructions as a text file to be analyzed. You can set it up so it also saves the register data at those points in your file too. Here is how I use it, from the debug window we still have our watchpoint activated. But I’ll show you how to deactivate the watchpoints and breakpoints first. From the Debug window click the bottom down arrow beside the command line bar and click on Break this stops execution and allows you to use the debug features again just like when a breakpoint or watchpoint has been triggered. From the Debug menu select New (Break|Watch)points Window

The window defaults to show the breakpoints but if you click on the top bar you can select ALL Breakpoints or ALL Watchpoints as below:

Select each view and click on the lines and you will see the X on the left turn into a red 0 to indicate it is disabled.

In this example I’m going to set a breakpoint at $4000 again manually and another breakpoint at the end of our program. This is so the trace output will be short, as these files can get huge if you let them run for a few seconds. Depending on the speed of your computer.

From the debug window command line type the following two lines to setup the two new breakpoints

bp 4000

bp 4009

Hit F5 and go to the RSDOS window and type EXEC again, after the BP stops and the debug shows line 4000 we will turn on the trace function by using the following command all on one line (shrunk to fit on one line):

trace output.tr.txt,0,,{tracelog “A=%02X,B=%02X,X=%02X,Y=%02X,U=%02X,S=%02X,CC=%02X “,a,b,x,y,u,s,cc}

Then hit F5 to continue the program execution, which will stop at $4009 where our last breakpoint was set. Turn off the trace function with the command in the debug window

trace off

Hit F5 again to get the RSDOS prompt again. When you want to close MAME once again hit the Emulation key mode key and the Esc key.

Once you are out of MAME you can view the trace file in a text editor and you should see the following:

A=00,B=44,X=ABAB,Y=AAF1,U=2E0,S=7F32,CC=84 4002: LDA #$48
A=48,B=44,X=ABAB,Y=AAF1,U=2E0,S=7F32,CC=80 4004: LDB #$49
A=48,B=49,X=ABAB,Y=AAF1,U=2E0,S=7F32,CC=80 4006: STD $0500
A=48,B=49,X=ABAB,Y=AAF1,U=2E0,S=7F32,CC=80 4009: PULS A,B,PC

This shows the values of the accumulators and registers on each line of code.

Another feature of the debugger you can also get it to run until an IRQ is triggered by hitting F7 as shown here:

Another helpful thing you can do while using the MAME debug mode is you can change the contents of any accumulator/register

For example when you stop execution you can type in the debug window’s command line:

pc=1000

would change program counter (pc) will be changed to address $1000 and the program would continue from address $1000 if you hit F5 or step through the code.

a=94

Changes the A accumulators value to $94.  You get the idea…

If you want to see the cycle counts in your code listing you can add these lines to your assembly source code:

        opt     c
        opt     ct
        opt     cd
        opt     cc

The code listing will output the cycle counts from the place you inserted the above special options.  Anytime you want to reset the counts you can just insert the following:

        opt     cd
        opt     cc

Here is a little output code so you can see how to use it in your source code and the actual cycle counts in the listing, just to the left of the 6809 instructions.  This is some example code showing different ways to clear data in memory (or the screen).  It’s from another article I’m working on about assembly optimization.

                      (       mycode.asm):00001                         opt     c
                      (       mycode.asm):00002                         opt     ct
                      (       mycode.asm):00003                 
                      (       mycode.asm):00004                         ORG     $4000
4000                  (       mycode.asm):00005                 Start:
                      (       mycode.asm):00006                         opt     cd
                      (       mycode.asm):00007                         opt     cc
                      (       mycode.asm):00008                 * Slow way
4000 8E4000           (       mycode.asm):00009 [3]     3               LDX     #$4000
4003 CE0000           (       mycode.asm):00010 [3]     6               LDU     #$0000
                      (       mycode.asm):00011                         opt     cd
                      (       mycode.asm):00012                         opt     cc
                      (       mycode.asm):00013                 * This loop is 15 cycles to update two bytes
                      (       mycode.asm):00014                 * We have to do this loop $2000 / 2 bytes each pass = $1000 times
                      (       mycode.asm):00015                 * 15 cycles * $1000 or 4096 = 61,440 cpu cycles
4006 EF81             (       mycode.asm):00016 [5+3]   8       !       STU     ,X++
4008 8C6000           (       mycode.asm):00017 [4]     12              CMPX    #$4000+$2000
400B 26F9             (       mycode.asm):00018 [3]     15              BNE     <
                      (       mycode.asm):00019                 
                      (       mycode.asm):00020                         opt     cd
                      (       mycode.asm):00021                         opt     cc
                      (       mycode.asm):00022                 * Faster way
400D 8E4000           (       mycode.asm):00023 [3]     3               LDX     #$4000
4010 CE0000           (       mycode.asm):00024 [3]     6               LDU     #$0000
4013 CC2000           (       mycode.asm):00025 [3]     9               LDD     #$2000
                      (       mycode.asm):00026                         opt     cd
                      (       mycode.asm):00027                         opt     cc
                      (       mycode.asm):00028                 * This loop is mostly 13 cycles sometimes 18 cycles every 256 bytes
                      (       mycode.asm):00029                 * $2000 / $100 = $20
                      (       mycode.asm):00030                 * $20 / 2 = $10  (half because we write 2 bytes per cycle)
                      (       mycode.asm):00031                 * $2000 - $20 = $1FE0
                      (       mycode.asm):00032                 * $1FE0 / 2 = $FF0  (half because we write 2 bytes per cycle)
                      (       mycode.asm):00033                 * 13 cycles * $FF0 + 18 cycles * $10 = $CF30 + $120 = $D050 = 53,328 cpu cycles
4016 EF81             (       mycode.asm):00034 [5+3]   8       !       STU     ,X++
4018 5A               (       mycode.asm):00035 [2]     10              DECB
4019 26FB             (       mycode.asm):00036 [3]     13              BNE     <
401B 4A               (       mycode.asm):00037 [2]     15              DECA
401C 26F8             (       mycode.asm):00038 [3]     18              BNE     <
                      (       mycode.asm):00098
                      (       mycode.asm):00099                 * Fastest method is to use unfolded loops
                      (       mycode.asm):00100                 * and use the U Stack pointer instead of a ST instruction
4073 CC0000           (       mycode.asm):00101 [3]     136             LDD     #$0000
4076 8E0000           (       mycode.asm):00102 [3]     139             LDX     #$0000
4079 3184             (       mycode.asm):00103 [4+0]   143             LEAY    ,X
407B CE6000           (       mycode.asm):00104 [3]     146             LDU     #$4000+$2000
                      (       mycode.asm):00105                         opt     cd
                      (       mycode.asm):00106                         opt     cc
                      (       mycode.asm):00107                 * This loop is 70 cycles to write 32 bytes
                      (       mycode.asm):00108                 * We cycle through the loop 256 times so the calculation is
                      (       mycode.asm):00109                 * 256 * 70 = 17,920 CPU Cycles
407E 3636             (       mycode.asm):00110 [5+6]   11      !       PSHU    D,X,Y
4080 3636             (       mycode.asm):00111 [5+6]   22              PSHU    D,X,Y
4082 3636             (       mycode.asm):00112 [5+6]   33              PSHU    D,X,Y
4084 3636             (       mycode.asm):00113 [5+6]   44              PSHU    D,X,Y
4086 3636             (       mycode.asm):00114 [5+6]   55              PSHU    D,X,Y
4088 3606             (       mycode.asm):00115 [5+2]   62              PSHU    D
408A 11834000         (       mycode.asm):00116 [5]     67              CMPU    #$4000
408E 22EE             (       mycode.asm):00117 [3]     70              BHI     <
                      (       mycode.asm):00118                 
                      (       mycode.asm):00119                         END     Start

I should also point out a nice feature of lwasm is the use of greater than > and less than < pointers.  You don’t need a label for every branch instruction.  In the listing above you can see the use of  “BHI    <” that tells the assembler to branch if higher back in the source until the first “!” is found.  You can also branch forward with a command like “BNE    >” which will tell the assembler to branch if not equal to the next “!” found below in your source code.

I should also point out there is a special version of MAME on GitHub that has some special enhancements for the CoCo that might come in handy.  You can read up about it and it’s features here.

I hope this info helps others to get the most out using MAME to learn assembly language programming.

Advertisements
This entry was posted in CoCo Programming, Emulation. Bookmark the permalink.

One Response to CoCo (6809) Assembly on a modern computer

  1. Pingback: TRS-80 CoCo (6809) Assembly on a modern computer – Vintage is the New Old

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s