C64 OS TECHNICAL DOCS
Last modified: Dec 10, 2018
There are three general types of memory allocation: Static, Automatic and Dynamic.
Static Memory Allocation
Static memory allocation is when space available for storing data is specified explicitly in the source code and compiled or assembled into the application binary. This space is therefore allocated alongside and as part of the memory allocated for the application binary.
Static allocations have a fixed size, a fixed location in memory, and remain allocated for as long as the whole application binary is in memory. Statically allocated memory cannot survive an application being quit and a new one being loaded in, because the new binary will overwrite the space where the old application binary and its static allocations were located.
Automatic Memory Allocation
An application may use stack space to allow a routine to be reentrant and to be called recursively, the stack is used to preserve the state of the current instance of the routine.
In C, every function is reentrant automatically and can be called recursively unless it makes explicit use of static memory. This is because all arguments in C are passed on the stack. And all local variables are automatically instantiated on the stack. This is referred to as automatic memory allocation.
The C language's pervasive and automatic use of the stack for argument passing and local variable storage is, however, problematic and ill–suited for use on the C64. The 6502/6510 has only three registers, only one of which can and must be used for stack–relative addressing. The stack pointer can only be transferred to the X register, destroying its previous contents. The X register can be transferred to the Y register, but only by passing it through the A register, destroying those along the way.
Stack–relative addressing can then be accomplished by absolute–indexed addressing from an offset from the start of the hardware stack, after transferring the current stack pointer to either of the two index registers. For example, like this:
TSX LDA $0102,X ...
Or like this:
TSX TXA TAY LDA $0102,Y ...
However, only a subset of the instructions that have an absolute addressing mode can use absolute-indexed addressing with each index register. And there are some egregious omissions, such as being able to load X or Y, but not being able to store X or Y.
Here's a table of all the instructions that use absolute addressing, and checkmarks to indicate if they have an indexed mode with the X and Y registers:
|Absolute||X Indexed||Y Indexed|
Meanwhile, using these instructions stack–relatively means occupying one of only three registers. This makes doing stack-relative processing within a loop much less efficient. Almost everything, in fact, becomes less efficient, because variables have to be juggled to and from the stack rather than just being temporarily held in a spare register.
There are yet still two more problems. Indirect addressing cannot be done stack-relatively. That is to say, a pointer cannot be on the stack, and simultaneously be read through. As, the only indirect-indexed instructions require the pointer to be in zeropage, which is not the stack. And lastly, the stack has a fixed size and is quite small, just 256 bytes.
For these reasons of hardware limitation, C64 OS routines, like in most C64 programs, do not use the stack by default to transfer arguments and automatically allocate local variables. This means that C64 OS routines are by default not reentrant. This doesn't cause a big problem, because C64 OS is not multitasking.
Routines in C64 OS, therefore, use statically allocated local variables. If a routine needs to be reentered, it must back up its static variables to the stack before calling code that will or could reenter it. Upon return, it must restore its static variables from the stack. This has some advantages. Every routine that does not need to be reentrant benefits by running much faster.
Dynamic Memory Allocation
Dynamic memory allocation is a means for applications, utilities and system code to coordinate their memory requirements at runtime and to carve up and lay claim to heap memory. Dynamic heap allocations can last longer than the lifetime of the application, although doing this imprudently can lead to the system eventually having no free memory, and no application that knows how to safely unallocate that memory.
Dynamic memory allocation is what sets C64 OS apart from most common C64 programs. Applications that run within C64 OS have the benefit of being able to use its memory allocation system to perform sophisticated memory manipulating tasks with relative ease.
The rest of this article is devoted to describing dynamic memory allocation.
One page of memory is 256 bytes. The C64 can directly address 256 pages of memory. These 256 pages are grouped into five main sections:
The 6510's Perspective +--------------------------------------------+ | $E000 - $FFFF - KERNAL ROM 8K | | - BITMAP SCREEN BUFFER | | - UTILITY BINARY AND VARS | | - UNSTRUCTURED FREE SPACE | +--------------------------------------------+ | $D000 - $DFFF - I/O, CHAR SET 4K | | - COLOR BUFFER, SCRN MEMORY | +--------------------------------------------+ | $B000 - $CFFF - C64 OS KERNAL 8K | | | | | | | +--------------------------------------------+ | $0900 - $AFFF - PAGE ALLOCATED SPACE ~42K | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +--------------------------------------------+ | $0000 - $08FF - SYSTEM, STACK, PG MMAP ~2K | | - SCREEN BUFFER | +--------------------------------------------+See weblog post: Rethinking the Memory Map
$E000 to $FFFF is a special region of memory that C64 OS shares with the KERNAL ROM. It is not affected by nor included in C64 OS's paged memory managed. There are system flags which denote how the whole area is currently being used. These consiste of: Bitmap memory, Utility binary and its static allocations, or free/unstructured. The entire area can only be used for one of these purposes at a time. If a utility is loaded in, it expects to have access to the first 8000 bytes of that region. If image data is going to be loaded into that region, any running utility will automatically be quit first. The final 192 bytes of this block are always reserved as system workspace memory.
$D000 to $DFFF is always 100% used by C64 OS. 1K is Screen Matrix Memory, the memory the VIC-II uses to display the text UI, or for color data in certain bitmap modes, or some combination of text and color data if the screen mode is split. Another 1K is used as a color memory buffer. And 2K are used to store the custom character set. This whole region is internally managed by C64 OS, including screen matrix memory, which should not be written to manually.
$B000 to $CFFF is the C64 OS KERNAL code. It cannot be used for anything other than this purpose, although there is some dynamism in how applications figure out how the contents of this memory are laid out.
$0000 to $08FF, very low in memory, is referred to as system workspace memory and is statically managed. It is used by the KERNAL ROM, the C64 OS KERNAL, application workspace, the hardware stack, the text UI screen buffer, and a paged memory map, which we'll return to in a moment.
The bulk of the memory, around 42K, resides in the middle. This area is composed of around 167 pages of free memory. Page numbers $09 to $AF. The paged memory map in workspace memory uses one byte for each of these pages to indicate the usage status of that page. Several values are defined for paged memory:
- MAPFREE - Free
- MAPSYS - System
- MAPAPP - Application
- MAPUTIL - Utility
The C64 OS memory module provides 4 routines that work directly with the paged memory map.
|pgalloc||Allocate a range of memory pages|
|pgmark||Mark a specific set of pages as allocated|
|pgfree||Free an allocated range of pages|
|memfree||Count the number of free pages|
Takes an allocation type and a number of consecutive pages. It scans the paged memory map, starting at the high end looking for a block of that many consecutive free pages. It marks each page with the allocation typed passed in. And returns the page number of the start of the range of pages, the lowest page number.
Additionally, it zeros out all the memory in these pages, and initializes them as a valid memory pool for malloc. More about this in the discussion of dynamic memory allocation below. The carry will be set on return if a block of consecutive pages this size could not be found.pgmark
Takes a first page number and a last page number, and marks the complete range as allocated with the MAPAPP type. This is used by the application loader. It prevents the page allocator from allocating from the area that is used by the application's binary. The application's binary shares memory space with the standard page allocator. However, the application is loaded into the lowest area of this space, while the page allocator allocates starting at the highest end of this area working down.
It is possible that the application being loaded could require using space that is already allocated. There is no way around this problem. However, it should be very rare. Before an application is loaded, requiring its binary space to be marked as allocated, any open utility is quit first, and its memory is unallocated, then the current application is quit and its memory is unallocated. Then the new application binary is loaded in, and the space it occupies is marked as MAPAPP allocated with pgmark. The only way this space would not be available is if an application with a small binary footprint allocates all of the available pages, AND allocates them with a custom allocation type designed to keep them allocated beyond the end of the app's own lifecycle. Then the next app that is loaded in has a footprint bigger than the previous app.
Typically an application will allocate pages for itself using the MAPAPP type. All of this memory type is unallocated automatically when the application is quit. An application can request pages to be allocated with an undefined allocation type. These will not be unallocated when the app is quit. This allows an application to install a routine that stays resident even after the app is no longer running. The app could leave a timer, interrupt or event loop wedge in place that calls code left behind. One has to be very careful about these, as they can lead to memory leaks, or other unintended system destabilizations. However, the ability to do this at all, enables many cool system hacks, and helps maintain a true C64 feel.pgfree
Takes a starting page number, and a number of pages to free. It marks these pages as free in the page allocation map. The memory is not cleared. If the page range indicated are all allocated it will return with the Carry clear to indicate no errors. If one or more of the pages were already unallocated, the Carry will be set.memfree
Takes no input parameters. Reads through the page allocation map and counts up and returns the number of free pages. This does not tell you how the pages are fragmented. It is possible to get a large number from memfree, such as 50, and for pgalloc to still fail when you try to allocated a block of more than one page.
The Memory Utility can be used to get a visual overview of the page allocation map.
In C, one typically expects to allocate memory in arbitrarily sized chunks. C64 OS can do this too. This can be used when getting chunks of memory in even increments of wholes pages is inappropriate.
|malloc||Allocate a 16-bit sized block of memory|
|free||Free a block of memory previously allocated by malloc|
When it is possible to use paged memory management, it is much more efficient. The menu system in C64 OS, for example, which is managed by the C64 OS Menu Module, uses only paged memory management. It does this by dividing each page into 8 evenly spaced, page aligned, 32–byte structures. Pointers within the structures double as a means of identifying the pages that were allocated to hold the menu data. There is very little overhead. No bytes are wasted and the memory can be fragmented. AKA, the linked menu structures can span across multiple, non–consecutive pages.
However, this is not always possible or convenient. In these cases we can use malloc. Malloc makes memory allocations from a specified memory pool. A memory pool can be any memory that has been allocated by pgalloc and is properly initialized. A memory pool can be as small as 1 page, or as large as all of the available and consecutive pages. A range of pages is automatically initialized by pgalloc to serve as a memory pool for malloc, and remains suitable for use as long as the initial 4–byte structure is not overwritten.malloc
Takes 1–byte for the page number that is the start of the memory pool to allocate from, and takes two bytes to represent a 16–bit length to allocate. If no free memory of that size or bigger is available, the Carry is set to indicate an error. Otherwise, the Carry is clear to indicate success, and a 2–byte pointer to the start of the allocated memory is returned.free
Takes a 2–byte pointer to a block of memory previously allocated with malloc. Frees the memory block back into the pool so it can be used again.
A memory pool is initialized as follows. The first byte of the first page of a pgalloc allocation is set to the number of consecutive pages in this block. Therefore, 255 bytes are available for use by malloc in the first page, and 256 bytes are available in all subsequent pages of the pool.
Blocks of memory within a pool consist of a 3–byte header followed by a number of storage bytes. The header consists of a 1–byte free/allocated flag, and a 2–byte length. When a memory pool is first initialized, the entire range of available memory is initialized as a single block that fills the entire memory pool, that is marked as free.
For example, if you pgalloc 3 pages, it is initialized such that the first byte of the first page is set to 3. The pool has (255 + 256 + 256) 767 bytes available. The next three bytes are initialized as: $00, $FC, $02. $00 to indicate this block is available. $FC and $02 is little endian for $02FC, or 764 bytes. This is the size of the memory block's available storage. Which is the full 767 minus the 3–byte header.
Above is a visualization of an example possible layout of memory after some malloc's and free's have happened. It is ellipsed in the middle to save space. The leftmost byte, colored green contains a 3. This indicates that when pgalloc was called, a range of 3 consecutive pages was requested. Immediately following this are three bytes, colored yellow, which are a malloc header. The first byte is 1, which indicates this block is allocated. The next two bytes, little endian, least significant byte first, are 07 00. This was allocated by malloc, with a requested size of 7 bytes. I've colored the next 7 bytes red to indicate they're allocated.
Immediately following that is another three bytes, yellow, for another malloc header. The first byte of which is 0 to indicate that this block is free. Its length is 3, seen as the three following bytes, this time colored blue for free. One more malloc header follows that, three more bytes in yellow. The first byte is 0 to indicate it is free. And its size is EC 02, but that's little endian for $02EC, or 748 bytes. If you add up the size of all three blocks: 748 + 3 + 7 = 758, plus 3 malloc headers: 3 + 3 + 3 = 9, plus the pgalloc header: 1, you get 758 + 9 + 1, or 768, which is the exact size of the three pages, 256 + 256 + 256 = 768.
When malloc is called, it starts at the beginning of a memory pool, just past the 1–byte pool header. The first three bytes are always be a malloc header to a block of memory. If it is available and it is bigger than needed, then it subdivides this first big block into two smaller blocks.
Dividing a block in two is relatively straight forward. As one big block it has a single 3–byte header. But as two smaller blocks it will need two 3–byte headers. When the block is split in into two, we know the first block will be the size that is being requested, and the size of the second block needs to be calculated. To calculate this size, you take the size of the total block, subtract 3 for the second block's header, then subtract the requested size for the first block.
This new size can be temporarily set aside. Write the requested size into the header of the first block, this effectively shrinks it to the size requested. Backup the pointer to this block, and then use the usual routine to advance the pointer to the next block. This will cause the pointer to point to the middle of the original block, but it's pointed right to where the second block's header needs to be written. Write a zero to indicate that it's free, and then fill in its size from the calculated size we set aside earlier. The only thing left to do is to restore the pointer back to the first block and return it.
Following this method, large blocks automatically get subdivided into smaller blocks at the time that malloc is called. The only exception is if a block is less than 4 bytes bigger than requested. In this case, there is not enough room in a second block for its header and a minimum size of 1 byte. A pointer to the slightly oversized (by just 3-bytes) block is returned.
As blocks are malloc'd and free'd, and larger blocks are divided and initialized into smaller blocks it will eventually be the case that a malloc will start to search for a block and will find one that is free but that is smaller than requested. It could even be the case that the entire memory pool is divided into many free but small blocks, such that even though in total there is enough free consecutive memory, no single free block is big enough for the request. Free blocks need to be merged together to form larger blocks.
Malloc starts at the beginning of a memory pool and looks for the first free block. If it is too small it will check to see if the next block is also free. If the next block is allocated then it cannot be merged in this block. This block is simply too small and has to be skipped.
Malloc continues searching for the next free block, and repeats. Every time it finds a free block that is too small, it checks if the next block is free. If the next block is free malloc will automatically merge the next block into the current one. It merges the two blocks regardless of whether the new combined size will still be too small, or if it will end up being too big. It just merges the two blocks.
Merging two blocks is even easier than splitting one block into two. The second block's header will disappear, becoming available space in the new bigger block. The second block's size, plus 3 for its header, are added to the first block's size. And that's it.
Malloc now branches back to the phase where it is comparing a found block's size to the requested size, but the found block is the newly merged, larger–than–before, block. If the newly merged block is now too big, malloc automatically splits it again following the process described above. You get two blocks again, but the size of the first is the size requested.
If on the other hand the merged block is still too small it simply repeats the process of checking if the next block is free and merging it in. It could be the case that several blocks get merged together, only to still be too small.
In this way, many small available consecutive blocks are automatically merged, one by one, as they are encountered, until they combine to a size big enough for the request.
Splitting and merging blocks automatically doesn't entirely solve the problem of fragmentation. However, this problem can be mitigated somewhat if you know in advance the general profile of the memory allocations, and split memory allocations with different profiles into different memory pools.
For example, if you know you will be allocating many small blocks, many of which will be long lived, you can pgalloc a memory pool just for those. But if you will also be allocating large blocks that are relatively short lived, you can pgalloc a different memory pool for those. This will prevent a small but long lived allocation from holding a space fragmented that needs to be used by a larger allocation.