NEWS, EDITORIALS, REFERENCE

May 9, 2024#125 Programming Theory

Hybrid BASIC/ASM Programs

Here is a blog post about some general programming theory and practice on the C64, that I personally find useful and hopefully you'll find it useful too.

I recently updated a tool that I use to help me with C64 OS development from being written in BASIC to being written as a hybrid of BASIC and 6502 Assembly. I'm sure there are many books and many magazine articles that have been published on this subject, but there can never be too much information on programming tips and ideas for the Commodore 64.

I should set the stage by explaining my development environment. I decided from the beginning that I wanted to develop C64 OS using a C64 (or C128), and not by cross development on a Mac or PC. I know that isn't everyone's cup of tea, but since I would be spending years working on the project, I wanted to spend those years becoming intimately comfortable again with the C64's keyboard, and the storage devices, and all the various commands in BASIC and in the DOSes of the drives. I think that time spent has paid off, as I now have a pretty comprehensive understanding about how things work.

A tricky thing that I found when doing native development is a lack of a standardized set of tools. How do you convert numbers from int to hex and back? How do you run a checksum on a file? How do you split an existing file into two pieces at an arbitrary place? How do you add a header to a file? How do you convert an ASCII file to a PETSCII file? And on and on and on. There are tons of small tasks that have to be performed in order to do the dirty work of creating software.

I have been collecting many small and useful tools, and these are then stored for standard access in a directory called "c64tools" found in the root of the C64 OS system directory. What tools I couldn't find, I wrote for myself. BASIC is usually the easiest way to sit down and plunk out the first revision of some tool.

BASIC is super convenient, because you just print out a couple of lines with the name of the program, and then use an input to get the name of a file from the user. Once you have the name of a file, you can open that file with one line and immediately start reading data in from that file, and it's all very easy. There is just one problem. As soon as you try to do something more than the very trivial, it is also very slow.

Let's talk about a real world example of a program that converts ASCII to PETSCII. We'll call it a tool, because that's the word I prefer to use for simple programs that are designed to be run from the READY prompt. I avoid using the word utility, because C64 OS has "Utilities" with a capital "U", and those are a very specific kind of program. So, tools.

Here's the basic idea; take a filename for input, take a filename for output; open both files, the first for read, the second for write as a SEQ-type file. Next, we'll read a byte from the input file, and check to see if its numeric value falls within a block that should be mapped to a different block.

In order to do this, we should refer to a PETSCII table such as the one that I've provided on c64os.com here: https://www.c64os.com/post/c64petsciicodes And compare it to an ASCII table such as can be found here: https://www.asciitable.com

ASCII is technically a 7-bit code, so it only has 4 blocks of 32 characters each. PETSCII is an 8-bit code, so it has 8 blocks of 32 characters each, although two of those 8 blocks are undefined. The PETSCII chart linked above numbers the blocks from 1 to 8. We can see that in both PETSCII and ASCII control codes are found in block 1, therefore these require no conversion. In both, numbers and and symbols are found in block 2, so they don't need conversion either. However, in ASCII, uppercase characters are in block 3, but these are in block 7 of PETSCII. And lastly, in ASCII, lowercase characters are in block 4, but these are in block 3 of PETSCII.

Therefore, if any character's byte value falls from 64 to 95 (block 3) we will add 128 to it, to move it to 192 to 223 (block 7). And, if any character's byte value falls from 96 to 127 (block 4), we will subtract 32 to move it to 64 to 95 (block 3). Then we'll write the newly mapped byte to the output file. After reading each byte we'll read the status byte into a variable. And after writing each byte, if that read-status variable is zero then there are more bytes to fetch, and so we'll repeat. When the read-status finally comes back as something not-zero, then after writing out the final byte we'll close both files and the task is complete.

Now let's write that in BASIC.

10 print"ascii to petscii converter" 20 print"copyright to whoever wrote it" 30 print 40 input"source file";sf$ 50 input"destination file";df$ 60 dv=peek(186) 70 open2,dv,2,sf$+",s,r" 80 open3,dv,3,df$+",s,w" 85 rem -------------------------------- 90 get#2,a$:s=status 100 a=asc(a$+chr$(0)) 110 if a<64 then140 120 if a<96 then a=a+128:goto140 130 a=a-32 140 print#3,chr$(a); 150 if s=0 then 90 155 rem ------------------------------- 160 close2:close3

It's so short and simple, right? And you can just type it up straight from the READY prompt, save it, and boom you've got yourself an ASCII to PETSCII conversion tool. Let's go through it line by line so we're clear about what it's doing.

The first three lines print out the name of the program, a copyright if you want it, and blank line.

Lines 40 and 50 ask for two filenames, which it saves as sf$ and df$.

Line 60 is what every C64 programmer should do, but which many don't. The value at 186 is the device number of the last accessed device. Simply by reading this into dv and now using dv instead of hardcoding the number 8 we have just added support for multiple storage devices or storage devices on some dev number other than 8. Hooray!

Lines 70 and 80 open two files. Logical file 2, uses data channel 2, on "dv" device number, and it explicitly asks for a SEQ-type input file, for read. If you copy files from a PC via an SD Card, say, and your ".txt" ASCII files appear as PRG-type files, it would be fine to just remove the trailing +",s,r" from line 70. It opens for read by default, and not specifying the file type would allow it to open either SEQ or PRG (or USR.) Though, technically an ASCII text file ought to be a SEQ-type file.

Regardless of the file type of the source file, line 80 opens on logical file 3, using data channel 3, on "dv" device number, and it opens the file with ,s,w to make it create a SEQ-type file for output.

Between the dashed REM lines is the meat and potatoes of the ASCII/PETSCII conversion.

Line 90 gets one byte from #2 (the input file), and it also reads the status from that operation into a variable "s". Why do we do this? Because the write operation is going to have an effect on the status, but we later need to refer back to what the status was after the read.

In order to compare numbers and perform mathematical operations, the single-character string a$ needs to be converted into its numeric value. Line 100 does this with the ASC() function, and saves the result to "a". There is a slight peculiarity in BASIC. If the byte read from the file has an integer value of 0, then a$ ends up as an empty string. ASC() only converts to a number the value of the first character in the string. Thus, by adding to the string a chr$(0), if the original string is blank, ASC() grabs and converts the chr$(0) into an integer zero, like it should be.

Now we can compare numbers. Line 110, if a<64, then it is in block 1 or 2, nothing to convert and so go to line 140 for output.

Line 120 checks if the value is less than 96. But we've already eliminated values less than 64, so this really means, any value from 64 to 95. If so, a = a + 128, that maps the value to block 7, and then we go to line 140 for output.

Lastly, if we get to line 130, we know the value is 96 or above, and we assert that since ASCII is a 7-bit code that it won't have values greater than 127. Very easy to handle this, just subtract 32, a = a - 32, and then fall through to line 140 to output the remapped character.

Line 140 prints to logical file 3, a single-byte string which is converted back from the numeric value "a" using the chr$() function. Pay attention to the semi-colon at the end of the print# command in line 140. Without the semi-colon there, print# would output a carriage return too. We definitely don't want that.

That's one byte successfully converted and output. At line 150, we check the "s" which is the read-status we saved. If it's zero, jump back to line 90 and repeat. When we've finally read the last byte, "s" will no longer be zero. The code at line 150 will not loop but fall through to 160, which closes the two files and the program ends.

This program will work! It's very logical, the implementation in BASIC of the steps that have to be taken is sound. There is just one problem. It's bloody slow. If you have to convert a file that's 50 blocks (50 / 4 = 12.5 KB) you will be waiting many minutes for it to finish. That's a bit long to convert just a few pages of text.

Implement it in assembly

Everyone knows that assembly is a kajillion times faster than BASIC. So the answer is easy, just implement it in assembly. The problem here is that assembly language, on its own, provides you with almost nothing to work with. Thank god for the KERNAL ROM (and some BASIC ROM routines) or you'd have literally nothing. But, from a UI perspective, there isn't much. This is, of course, one of the main reasons for the existance of an operating system such as C64 OS. The clipboard, and memory management, and Toolkit UI widgets, and mouse pointer and menus... the assembly language program just draws together the various elements that the operating system provides.

However, there are times when you want to write something very simple that can be run just from the READY prompt, a small or simple tool, like an ASCII to PETSCII text file converter. The problem is, somehow you have to output strings to the screen, and then you have to prompt the user and get input for the filenames, and then you have to open the files before you get to the hard work.

Anyone who has ever optimized code will know that in a given program, the computer may spend 1% of its time in 99% of the code, and then spend 99% of its time in just 1% of the code. As the optimizer what you want to do is figure out what's the 1% where all the time is being spent? (Modern tools in modern languages identify these things for you, but we're coding like it's 1983, so we're doing the analysis by hand.) Fortunately for us, it's easy to see that lines 90 to 150 are where all the time is spent, which is why I split them out with the dashed REM lines.

BASIC and assembly in one program

Printing lines, getting user input strings, opening files based upon those strings, that stuff is a pain in assembly. But it's also the part that takes almost no time to execute. So let's do in BASIC that short part that's annoying to implement in assembly, and then do in assembly just the part that takes too long to run in BASIC.

But how do we get BASIC and assembly into a single program?

Many programs written fully in assembly will include a small BASIC header. When the program is loaded, and listed, it will have just a single line, something like:

64 sys2061

The line number doesn't matter, but making it line #64 is a fun wink-wink. The only thing the line does is SYS2061, which is an address somewhere just past the end of the BASIC segment.

How does one get this basic prelude or header into the assembly program? It turns out, you can just embed the byte codes that represent the entire BASIC program right from your assembly program. Different people have different tricks, but in C64 OS, there is a file //os/s/:basic.s that you can include in your assembly program, it looks like this:

;----[ basic.s ]------------------------ ;HOW TO USE: ; ; .include this file as first code. ; Omit the *= $xxxx from your code. *= $0801 .word end ;Next Line Ptr .word 64 ;Line #64 ;) .byte $9e ;SYS .null "2061" ;$080d end .word $00 ;End of Basic

It includes the *=$0801 for you, so the whole shebang will get loaded right into memory where a BASIC program normally starts. An lo and behold, the first 12 bytes ARE a BASIC program. Memory address 2061 is $080d in hexadecimal, which is exactly 12 bytes more than $0801. Therefore, the SYS tells BASIC to begin executing assembled 6502 code starting on the first byte following the BASIC program.

This is great, but only if you want the entire program to be in assembly save for this one SYS line. What we want is to have a fairly complex chunk of BASIC, followed by some assembly. But we definitely don't want to write the BASIC program by manually encoding all the byte tokens like in the short example above. How do we get the best of both worlds, and make them work together?

What we want to do is write the BASIC part the way we write any BASIC program and save it to a file like normal. Then we want to write an assembly language program that implements just the bit we care about, and assemble it to a file. And then, as a final step, we want to merge the two files into a single file, that can be distributed, loaded and run like any other program would be.

So let's see how we can do that.

Implement the short part in BASIC

Let's start with the BASIC part. Here is our new stripped down BASIC program:

Pretty straightforward, right? We've just chopped out the long running part and replaced it with SYS2000. But, will the assembly part REALLY be at memory address 2000? We don't know this yet. So how do we find out?

Enter this program, and then save it to disk with a filename ending in .bsc (.bas or whatever is your convention.) The extension tells you that this is not the final program, but is just the BASIC component of the program. Let's say we call it "asc2pet.bsc"

Once saved, you need a program that will open the BASIC program file, read it in, and count the byte length of the file. Well, how do you do THAT??! You need another tool to do that. You could write that other tool in BASIC... but it would be slow. Boy, wouldn't it be nice to have a hybrid tool that is both BASIC and assembled 6502 to analyze the size of a BASIC or any other program? Yes, yes it would. C64 OS includes just such a tool called fileinfo found at //os/c64tools/:fileinfo

fileinfo tool showing an example of how it looks.

Numbers in the image do not accurately reflect the examples shown in this blog post.

Of course, fileinfo was originally implemented just in BASIC, but then at some point I upgraded it using the technique I'm describing in this very blog post to make it faster. The important thing to know about calculating the size of a PRG-type program file (whether it's BASIC or even an assembled code file) is that the first two bytes are the program's load address. They are in the file, but they don't take up any memory when the file is loaded in.

I'm not going to go into the implementation of fileinfo here, but if you have a copy of C64 OS, you can use the fileinfo tool that's provided for you. It asks for a filename. It opens the file, reads in the first two bytes and prints out the address where the program starts. It then continues to read and count the remaining bytes in the file. Then it prints out the size of the file's content (the whole file minus 2), and for convenience it tells you what is the last memory address that the program uses.

You can use this tool to analyze the last address used by asc2pet.bsc. But, we'll come back to this to see what to do with it.

Implement the long part in assembly

Next we have to create the assembly language part. Open up TurboMacroPro and we can create a program like this:

.include "//os/h/:modules.h" #inc_k "io" #inc_s "file" *=2000 loop ldx #2 jsr chkin jsr chrin ldy status cmp #64 bcc output cmp #96 bcs notblk3 ;clc ... carry is already clear adc #128 bne output notblk3 ;sec ... carry is already set sbc #32 output pha jsr clrchn ldx #3 jsr chkout pla jsr chrout jsr clrchn cpy #0 beq loop rts

Let's briefly understand what this assembly program is doing before getting back to where it is located in memory.

We've already opened the file for input on logical file #2, and the file for output on logical file #3. Therefore, very conveniently, we don't care where those files are found, nor what their names are. All we have to do is input on LFN 2 and output on LFN 3.

At line 7, put 2 in the X register and call chkin. chkin is in the KERNAL ROM, and the address is made available to us by that #inc_k "io". (These header include macros and header files are a standard part of C64 OS.) If you wanted do this without the benefit of C64 OS's header include system, you could just define these:

chkin = $FFC6
chrin = $FFCF
chkout = $FFC9
chrout = $FFD2
clrchn = $FFCC
status = $90

The fixed addresses of these you can look up in a book, such as the C64 Programmer's Reference Guide, or from my reference blog post, C64 KERNAL ROM: Making Sense.

This prepares the drive to let us read in a byte. Call chrin and we get that byte in the accumulator. On line 11, we read the status byte into the Y register to check it later. status is in zero page, it's actually at address $90, but the C64 OS header //os/s/:file.s defines that for us.

Next we perform the comparisons just as we did in BASIC. If the value is less than 64 it jmps straight to output. If the value is less than 96 it continues at line 19. Before adding #128 we should clear the carry, but the immediately preceding branch if carry set means we can be certain that the carry here is already clear. Add #128 to the accumulator then branch to output. It's a branch if not equal to zero, but we know that this value will never be zero, because nothing between 64 and 95 will be zero after 128 is added to it, so it's functionally a branch always.

Lastly, at line 23, if the value was greater than or equal to 96 we just subtract #32 like we did in BASIC, assuming that in 7-bit ASCII all values are from 0 to 127. We should set the carry before the subtraction, but again, we only get to this block of code via a BCS. Therefore we know the carry is already set and can skip a step. Then fall through to output.

The call to clrchn, at line 28, affects the accumulator. Therefore we'll back up the byte, which was read and converted from ASCII to PETSCII, by pushing it to the stack. Clrchn sends untalk to the drive, then we chkout on device number 3. This, like the chkin device, has already been opened for us. The assembly code doesn't need to worry about what device it is or what filename it has.

Pull the byte of PETSCII from the stack and output it with chrout at line 34. Clrchn is called once more to send unlisten to the device and prepare to loop for the next byte. The loop continues as long as the status after the read is zero. Chrin itself affects the Y register, but after reading the status into Y the calls to clrchn, chkout and chrout do not affect that register.

When the loop is done, there is no need to close the channels just return to BASIC with RTS. The heavy lifting has been done by the assembly routine. The only remaining thing that BASIC does is calls close2 and close3, and ends.

Link together separate object files

Both parts are saved to separate files. The basic part as asc2pet.bsc, and the assembly part has both a source code file, say, asc2pet.a, and the assembled object file, say, asc2pet.o.

We want the final program asc2pet (with no extension) to contain both the BASIC and the assembled part. Joining the two files can be done using the concatenate-on-copy feature which has been a part of CBM DOS going back to v1.0 in the 2040 from 1979. (See: Page 26 of the 2040 User's Manual.)

The copy command takes the name of the new file first, then the equals sign, then a comma delimited list of the files to copy and concatenate together into the new file. For us it will look like this:

Without JiffyDOS:

open15,8,15,"c:asc2pet=asc2pet.bsc,asc2pet.o":close15

With JiffyDOS:

@c:asc2pet=asc2pet.bsc,asc2pet.o

I don't know if Commodore expected this feature to be used for concatenating binary (e.g., program) files together. Although the 2040 and 1541 user's manuals don't specifically mention it, the examples they give show filenames that give the impression of holding textual data. The CMD HD User's Manual goes a step further and says:

Up to five files may be combined into a single file by using this command, though it is important to note that copying a number of files into a single file is only effective with text files. CMD HD User's Manual, section 9-28. (Emphasis mine.)

It's typically only effective with text files, because any random human-readable text concatenated with any other random human-readable text is just a one big lump of human-readable text. Whereas, if you took two programs, like, say, two BASIC programs, and concatenated them together the result would not be useful. The first program would probably still work but the concatenated second program would be in the wrong place in memory.

Our task is to make sure that the concatenated assembly part ends up in the right place in memory. After saving the BASIC program, and running fileinfo on the file asc2pet.bsc, it will tell us that it starts at $0801, and that it contains some number of bytes, like, say, 260 bytes. The exact number of bytes will vary, of course, depending on exactly what is in your BASIC program. And the number will change every time you modify the BASIC portion. So, every time you open, modify and then save the BASIC part of the program, you have to run fileinfo on it to see how big it is.

But let's just suppose it's 260 bytes, that's the approximate size of the asc2pet.bsc program given earlier. In hexadecimal that's $0104. The program starts at $0801, so the last address of memory used by the program is $0801 + $0104 - 1 = $0904. Convert that back to decimal, and the last memory address used by the program is 2308.

Now, you would think that that means our assembly language program should begin at address 2309, right? But there is one gotcha. When we assemble asc2pet.a into asc2pet.o that .o object file starts with its two byte load address. Normally, if we just loaded that file in on its own, the KERNAL would read those first two bytes in but not put them in memory. Depending on whether the load command included the extra ",1" it would either ignore those two bytes or use them to determine where in memory the rest of the file should go. But we're not loading this file directly; we're concatenating the file onto the end of the BASIC program. And the concatenate process doesn't care what those first two bytes mean, it just concatenates them during the copy.

What that means is that when the final file is loaded, the BASIC component will load into memory from $0801 all the way up to $0904 (2308 in decimal,) and then the two byte load address from the assembly component's object file will go into memory at $0905 and $0906 (or 2309 and 2310) and then the first byte of the assembly code will go into memory at address $0907 (or 2311 in decimal.)

When we first wrote and saved our BASIC program, we didn't actually know how big it was going to be. But now we do know. So we have to load asc2pet.bsc in again, and very carefully change the SYS2000 (on line 90) to SYS2311. Then resave the file to asc2pet.bsc. Why so careful? Because if the BASIC program becomes even one byte either longer or shorter, then it would be necessary to re-run fileinfo, get the new length, and compute the new starting memory address for the assembled component.

There is another gotcha to pay attention to. Notice how we plugged in a temporary address of SYS2000, and then later updated that to SYS2311. Both addresses are exactly 4 bytes, so there is no change in the length of the file. If your BASIC program was quite large, and you filled in SYS2000 as the placeholder, you might find you have to update that to, say, SYS10029. Oops, now you've made the program one byte longer, by necessity. You could fudge it a bit by removing one character somewhere else, like by removing one dash from one of the REM ----------- lines. But the safe thing to do is save it again, run fileinfo again, recalculate the start address of the assembly program, load the BASIC program again, fill in the exact number again this time without it needing an extra byte, and save it once more.

Now we know that our BASIC program is going to jump to 2311. Open the asc2pet.a assembly source code file in TurboMacroPro, and set the initial load address such as follows:

        .include "//os/h/:modules.h"
        #inc_k "io"
        #inc_s "file"

        *=2311
        
        ... and the rest of the program...

Notice how in BASIC you must use a decimal number with the SYS command. There is no option to SYS$0907. In order to keep things simple, there is no need to do the conversion back to hexadecimal for the assembly part. An assembler like TurboMacroPro is perfectly happy to assemble to a decimal number. I plugged in *=2311 above so the number "2311" is used in both parts of the program.

Now, when you assemble the file to asc2pet.o, the bytes $07 and $09 (little endian for $0907) are in the file. After concatenation and then loading in the final program file, the $07 will get put in memory at 2309 (decimal) and the $09 will get put in memory at 2310, and lo and behold the first byte of the actual code, assembled to 2311 will load into memory at 2311, and BASIC will SYS to 2311, and it all works!

You can now distribute asc2pet with both parts concatenated together as a single file. The user can load "asc2pet" as a single file and run it, and it runs with 6502-assembly speed.

Final Thoughts

Now that we have a chunk of assembly to do the heavy lifting, we can spend as much time as we want to optimize how the assembly part actually works. In the example above, I more or less just mimicked how the BASIC code did it. It will still be faster to have done it in assembly, but it is almost certainly not efficient to call chkin and chkout for every byte. Optimizing the assembly language, however, is out of scope for this blog post.

Over the years I've written probably 10 to 20 small tools that assist me when developing C64 OS and other software natively on my C64 or C128. For most of these I began by writing the tool in BASIC. Something quick and dirty that gets the job done, even if its runtime is kinda slow.

But in the interest of optimizing my own time, each tool that I've found myself returning to again and again (like fileinfo) and I've found myself waiting around for it to finish, has inspired me to rewrite that tool as a hybrid BASIC/ASSEMBLY program, using the technique given above. And every time, I'm blown away by how much faster the hybrid program is.

The latest example is the tool "relocator" used for helping to create relocatable binaries for C64 OS. It now runs in a small fraction of the time its pure BASIC predecessor needed. This updated version of "relocator" will be included, in the //os/c64tools/ subdirectory, in the software update to C64 OS v1.07, which will be a free update for all licensed C64 OS users.

If you don't know about C64 OS software updates, you can check that out on the official C64 OS System Updates page.

Do you like what you see?

You've just read one of my high-quality, long-form, weblog posts, for free! First, thank you for your interest, it makes producing this content feel worthwhile. I love to hear your input and feedback in the forums below. And I do my best to answer every question.

I'm creating C64 OS and documenting my progress along the way, to give something to you and contribute to the Commodore community. Please consider purchasing one of the items I am currently offering or making a small donation, to help me continue to bring you updates, in-depth technical discussions and programming reference. Your generous support is greatly appreciated.

Greg Naçu — C64OS.com