Zilog z80 to Motorola 6809 Transcode – Part 014 – MAME -debug and how fast is the code

Hi All, I thought I’d talk a little bit about one of the most powerful tools to use while transcoding, and that tool is MAME.  Specifically MAME using the -debug feature.  With the debug feature turned on you can do all kinds of great things with the emulated system.  It is a great tool to use to compare how the original system is running and compare your transcoded version is running down to the same location in code.  At the same time you can see the register values, the program counter, stack, accumulators.  You can also pull up memory view windows to see blocks of memory in real time or stop the process and step through the code one instruction at a time.  Another great feature is MAME will show you the beamx and beamy values and the frame # all having to do with the video rendering.  The beamx is the horizontal position and the beamy is the vertical line that is being drawn and the frame is the count of the number of frames that have been displayed since the emulation has started.  All this and you can set up break points for where in memory you would like the program to stop.  You can also setup watch points, that tell MAME to watch certain areas of RAM and when it is read or written or both to break and let you know where in the code execution it was when the read or write occurred.

I finally got my Pac Man code running to the point where the attract screen animation is happening and I thought it looked pretty good then I ran it beside the real Pac Man and noticed it looked the same but my code was running way slower then the original.  So I decided to investigate where the slowness of my code was coming from and this is where MAME -debug comes in handy big time!

You start MAME off in debug mode using the -debug option from the command line for example:

mame -window -natural -nojoy -debug coco3 -flop1 Disk1.dsk

Once MAME starts I’ll set a break point from it’s debugger’s command line with the command:

bp XXXX

Where XXXX is the hex address in RAM where I want the execution to stop.  So I looked through the output listing of lwasm which gives me all the code and the addresses, so I know exact where I need to stop the code to examine how long some code is taking to run.  For example I know my sprite routines currently start at 260A in memory and end at 27B5

So I would set two break points in MAMEs command line (use the help command to see all kinds of help with the other instructions MAME debug has):

bp 260A
bp 27B5

then press F5 to run the code and when it gets to address 260A MAME will stop and give you the command prompt again.  At this point you can record the beamy and frame #.  Then press F5 again and it will go through the sprite code and stop at address 27B5.  Now you can record the new beamy and frame #

What can you do with the beamy and frame #?  Well if you know how fast the CPU is you can use this information to see how many cycles have past between the first break point and the second break point.

It works like this for the CoCo3 in high speed mode.  The 6809 is running at 1.78 Mhz or 1,780,000 cycles per second.  So we take that number and divide it by 60 which is the number of screen updates per second or frames per second.  This gives us 29,666 cycles per frame.  MAME displays the coco 3 with 240 lines of resolution so we can divide 29,666 bye 240 to get the number of cycles the 6809 can do per line.  This turns out to be 124 cycles per line.

So now we take the end beamy value and divide it by 240 then multiply it by 29,666 and add it to the frame #.  We do the same with the start beamy and frame # and subtract the result from the end value and blamo we have the number of cycles that were used between the break points.  This will be very useful for testing the code to see which parts are taking the most CPU cycles and require the most optimization.  I’ve done a lot of this testing today with my current code and the numbers don’t look too good so far.  I’m running nearly 1/4 the speed of the original Pac Man 😦

I’ve reached out to the CoCo community on the CoCoList for Color Computer Enthusiasts and they have been super helpful and have already offered to help me optimize the 6809 code.  They are a great bunch of guys and I’m sure I’ll need their help in the future.  They have encouraged me to keep working on the code and they assured me that with lots more work I should be able to get the the 6809 version of Pac Man to run as fast as the real Pac Man!

I hope they’re right,  see you in the next post…

This entry was posted in CoCo Programming. Bookmark the permalink.

2 Responses to Zilog z80 to Motorola 6809 Transcode – Part 014 – MAME -debug and how fast is the code

  1. When I was writing Knight Lore I needed a profiler that listed all my routines and let me know how many cycles it was spending in each. For this I found VCC had the lowest barrier to entry. I was hoping to develop a (quick’n’dirty) generic code profiler but soon realised that wasn’t feasible with jump tables, code that jumped to an absolute address and re-initialised the stack, and other tricks like discarding return addresses to return further up the call stack.

    However, all was not lost, and after developing a generic (albeit simple) profiler core I simply added some special cases based on known addresses and I was able to bash it into shape. The end result was some rather nice and useful output, complete with subroutine names extracted from my ASM listing. For an example output, see my Retroports blog, September 2017.

    I would highly recommend you consider this approach; it’s a lot less painful than setting breakpoints in MAME and breaking out the old HP calculator! Also less error prone. And one word of warning; I’m not sure the beam values are kosher… for instance I was getting a different BEAMY value at the beginning of every VBLANK IRQ (it was +1 from the last each time).

    If you’re interested I can send you my VCC source with in-built profiler. The hooks into VCC itself are – by design – minimal. I was going to add this to the official project, but since it’s impossible to write a generic profiler for reasons outlined above, or at least without more complexity to be able to take into account ASM tricks, there was little point.

  2. nowhereman999 says:

    Hi Mark, thanks for the offer of the modified VCC source. Maybe down the road I’ll reach out to you for it when I get serious about optimizing the code. I think I’m going to finish Pac Man so that it runs perfectly but slow and release the source code like that. Then work on optimizing it and when I’m done I’ll release that code too. I think it would be a helpful learning tool to see the before and after. At least that’s my plan…

Leave a comment