NEWS, EDITORIALS, REFERENCE
A Hybrid Memory Manager
There are many great articles on the web about how standard C libraries implement memory management.1 But these articles always assume you've got a PC with lots of RAM, from many megabytes to many gigabytes. The implementation of these memory managers take up lots of memory and so on a C64 we need a solution that suits our needs and has a small lightweight implementation.
Typically in a modern OS, on modern hardware, memory is not only managed but it is also virtualized. So your program is working with numeric values that represent addresses, but those numeric values are not the actual hardware memory addresses. The C64 lacks hardware support for this sort of virtualization and so it is entirely impractical. It's also less useful since C64 OS is single-tasking anyway. In these virtualized memory systems, the entire memory range available to the process is allocated by the same mechanism. If you want a 1 megabyte buffer, for any purpose, you ask for a 1 megabyte sized block of memory the same way you ask for a 20 byte block of memory to hold a small structure. Depending on order in which memory is requested large chunks and small chunks may end up getting interleaved. And as memory is released this can cause fragmentation. It's not a super big deal when you have a lot of memory to work with. And it can be mitigated by more sophisticated memory manager implementations that are able to analyze and coallesce many small blocks into larger blocks.
I've changed my mind a couple of times thinking about what sort of memory manager would be most useful, but I've pretty much settled on a hybrid model. The first model is the simplest and is about 99% complete and the second model will piggy back on the first. So, what are these models, and how do they work?
The first model is a very simple block allocator. The 6510 can address exactly 256 pages of memory, where each page is 256 bytes big. Very conveniently the number of pages fits within a single 8-bit value. So, a single 8-bit number can be used to refer to any page. In order for the memory manager to know which pages are available and which are used, it keeps track of availability with a 32 byte map. 32 bytes times 8-bits is 256. A bit value of 1 means the page is available, a bit value of 0 means the page is spoken for. The availability map functions a lot like the BAM (Block Availability Map) of a floppy disk.
There are several jump table calls for the programmer to use to free and allocate memory. You set the number of contiguous pages in the .A register and call the pagealloc2 routine. The routine searches the map for that many 1's in a row. Sets those 1's to 0's and returns in the accumulator the page number of the first in the sequence. The carry is used to indicate success or failure. A failure occurs when it cannot find as many consequtive free pages as were requested.
There are a couple of tricks, but it really is almost this basic. The allocator checks to see if the KERNAL is patched in, or the BASIC ROM or I/O are patched in and will automatically not give you blocks that fall under those areas. If you wanted blocks under the KERNAL, you'd have to patch the KERNAL out prior to making the request. C64 OS automatically marks the memory that it itself uses as unavailable, as well as the first 2 pages, which are used by the system for stack, screen, and KERNAL working space.
The page blocks returned are 256 bytes big, and if you requested 3 pages, you might get back a value like $09. That means the allocator is telling you that $0900 - $0BFF are yours to do with, however you like. No other calls to the allocator will return page numbers $09, $0A or $0B until you free those pages. Freeing pages works exactly the same way. You could call pagefree three times, once for each of the pages. Or you could pass the first page and specify how many pages in sequence to free. The allocator just sets those bits in the map back to 1's.
The allocated pages are completely unstructured. To help with this C64 OS provides a few routines that operate on a whole page at a time. One routine fills a page with a given value. So you if you want to fill the page with zeros that's easy. Another routine will copy the contents of one page to another. And lastly, there are shift routines. The shift routine will shift the contents of a page left or right by an arbitrary number of bytes, and fill the space created by the shift with a byte you pass in. The goal is for these routines to be the foundation of some library routines for efficiently editing very long strings. Otherwise, though, the pages returned to you can be used for anything. To hold a chunk of sound data, or to buffer incoming data from an internet service, or from a file being loaded in, or whatever.
The second part of the memory manager is a more sophisticated arbitrary length allocator. Suppose you want to create a linked list of structures. The structures are all 22 bytes big, but you want to create an arbitrary number of them, and you want to be able to create them, move them, and remove them from the list. How do you do that? In a modern OS, when you call malloc() you can pass in an arbitrary byte size that is exactly what you need for your structure. In this case, 22 bytes to hold one more structure. You can't do this with only a page allocator.3 But nor do I want the entire C64's 64K of memory to be managed with the structure necessary to make this possible. Mostly I don't want to do that because it's too easy for memory to become fragmented, and the logic required for compensating for that is too complicated.
Instead, here's what I've opted for. First you decide how many such structures is a reasonable number you might want to support. 150? 200? Okay, 200 times 22 bytes is a little over 17 pages, plus you need a bit of additional space for the allocator's headers. So, we ask the page allocator for 17 pages, and we get the page number to the start of 17 consecutive pages that are completely unstructured. Next, since we want to use this ~4 kilobytes of space for our linked list of 22-byte structures, we call a memory initialization routine on the first page, specifying how many pages we've allocated. This sets up the initial header at the beginning of the first page which records the total size of available space in the pool, and zeros the rest of the entire pool of 17 pages.
When you want your first 22-byte (22 bytes is just some size I picked for this example) structure size, you call the malloc routine and pass in 3 bytes. The first is the page byte that is the first page of the 17 pages allocated and initialized above. This essentially informs malloc which "pool" of memory it's allocating out of. Because, you could very easily have more than one pool going on at the same time for various purposes. The next two bytes represent the exact number of bytes you want allocated for your structure. In our case we'd pass $00 and $16 ($16 == 22 decimal). Malloc starts at the initialized header at the beginning of the pool to see how much total space is available in the pool. Then it starts looking through the pool for the first available space that's at least 22-bytes big.
The way this works is very standard for the memory allocation used by modern OSes. So this will not be news to anyone familiar with memory management on PCs or Macs. When you ask for a 22-byte block, the allocator returns you (from within the pool) the exact address of the start of where you can safely place 22-bytes. But the couple of bytes just before that address are a tiny header4 that indicates how long this block is and if it's used or not. When you want to free this block you call free and pass in the address that was returned by malloc. Free jumps back 2 bytes to find this block's header, and sets its available bit high. And that's all that free has to do.
The magic is in malloc. When you call malloc asking for some space, it starts at the beginning of the pool, and moves to the first block which starts immediately after the pool's header. Then it looks at that block's header to see how big it is and whether it's available or used. If it's available, and it's the same size or bigger than the size you want, it simply sets that block's availability bit low and returns you the address of that block. But if that block is used, it skips ahead by the size that this block's header says it is. That's how it finds the header of the next block. And then it repeats by checking to see if that block is available and how big it is. Eventually one of three things will happen. It might find there is no more available space in the pool and return a error (via the carry.) Or it may find a previously allocated block that has been freed that it can mark as used and return to you. Or it may jump to just past the end of the last allocated block. Because the pool is zeroed upon initialization it will find what looks like a header of $00 $00. Zero length indicates that this block has never been allocated. It sets the size according to how much space was requested, sets the availability bit low, and returns the address immediately following this freshly configured block header. Magic.
There is a lot of extra stuff that malloc could do. For example, it could see that you want a space that's 40-bytes long and notice that there are 2 free blocks in a row that are both 22-bytes long. And then it could join these two free blocks by changing the length of the header of the first block to be 46 bytes. (22 * 2, plus the 2 byte header of the second block that gets subsumed.) Another thing it could do is split a big available block into smaller blocks. Malloc could notice that you're asking for a 22-byte block, but it finds a free 68-byte block. It could change the size of the block's header to just 22, and create a new header 22 bytes in that is marks as free a new block that's 44 bytes long. (68 - 22, minus 2 bytes for the new block's header.) It could do all these things. But all this logic takes up precious memory and might not be all that useful for what C64 programmers realistically want to do.
The advantage to using a hybrid memory allocation model is that fragmentation that occurs within a pool of consecutive pages is limited to that one pool. When you're done with that pool you can just abandon the entire malloc/free structure and headers inside the pool and use the page allocator to deallocate the entire pool at the page level. Those are my thoughts on how memory management will work in C64 OS.
- Here's an article by IBM talking about how to implement a simple memory manager in a Linux environment. [↩]
- The names of the routines are still in flux. The name pagealloc will probably change to something else. [↩]
- Not unless you want to allocate a whole new page, but that would mean wasting 256 - 22, or 234 bytes! Clearly that's not acceptable. [↩]
This header has to be as small as possible. 2 bytes is enough to represent any
length up to 64K. However, it's impossible to allocate a 64K block and still have
space for the OS, the allocator's headers, your application code, etc.
If instead I use bit 7 of the second byte as a used/available flag, that leaves 15 bits for the size of the block, for a maximum block length of 32K. A very reasonable limit. [↩]