NEWS, EDITORIALS, REFERENCE
Timers, One More Time
I love learning. And let me tell you, working on a big project that pushes your boundaries is just one learning experience after another. I've got quite a bit of code in C64 OS now, and so after I've been working on some areas for a while and neglecting others, I sometimes go back to code I wrote months ago, look it over and say:
What the hell was I thinking back then? Greg Naçu — 2018
I'll give you a quick example.
I probably “finished” writing the majority of the code in the screen module what feels like a year ago. The screen module implements the main event loop, fetches events from the event queues and distributes them to screen layers. The code also tracks screen layers and lets you push them and pop them to and from the screen layer stack, among other things. I looked over that code, and realized the way I was doing the pushing and popping, the way I was tracking the screen layers and accessing them, it was almost insane. In the intervening months I've just learned and improved so much that when I go back and look at my old variable naming conventions, code formatting, and comment style, they look horrible.
Not to mention the way it actually works. I've done lots of silly amateur stuff, like, loading the accumulator and then immediately comparing it to zero with CMP before doing a branch on equal to zero. In case you're unaware, that's pretty dumb. Loading the accumulator accordingly sets the Zero flag automatically. Comparing to 0, in this case, is just busy work.
Another example can be found in technique. There are two ways you can make a table of pointers. You can make a single array of 2–byte elements. But if you do this, your byte–index into the table has to increment and decrement by two to track the array offset. And to get the high byte of one of the pointers, for example, you have to set the index to: (array offset)*2 + 1. Alternatively, you can split the pointers into two arrays of 1–byte elements, a table for lo bytes and a table for hi bytes. To move through these tables you just increment and decrement the index by one. To get the high byte, (if the X index register is already set,) it's just LDA HIBYTES,X. And to get the low byte the index stays exactly the same, and it's just LDA LOBYTES,X. And so on and so forth.
Live and learn, right? Learning the practicals of programming is like learning to play pool. You can know all about geometry, but if you don't have that muscle memory and instinctive experience, your pool game is gonna suck.
That prelude was to set the stage for another post about "timers." The last time I wrote about timers was about a year ago, the end of March 2017. That post is called, Making Course Corrections, Timers and if you're feeling bold and want to know how things have developed go ahead and read it. It's an interesting piece because I wrote an entire post detailing my plans for a timer system, and before I had the chance to polish and publish it, I had a complete turn about in how I thought timers should work. So I wrote a second post, which consisted of quoting the first piece and commenting on the problems it would lead to and then finished off with a description of my new approach.
My first hope was to have a well structured and efficient system that could be extended to support multiple simultaneous timers, ideally by using the jiffyclock. At the same time, I was reading the Official GEOS Programmer's Reference Guide. Several issues arose in my head while I thought about how it would all work, and I was unable to resolve those issues. The way timers work in GEOS gave me some ideas, but also discouraged me and I ultimately backed down and decided to implement a more open–ended system.
In a nutshell, I took a restrictive timer structure that wasn't going to work, and replaced it by ditching support for timers altogether and allowing free hooks to be made via vectors at key places in the IRQ service routine and main event loop, that could in theory be used to implement a timer system.
Okay, so let's dig into this.
How Do Timers Work in GEOS?
I don't want to bemoan GEOS all the time, it was written a long time ago, and it had different goals than C64 OS has. However, it is an OS for the Commodore 64, and so it's a natural move to compare the two.
What are generally referred to as "Timers," are instead referred to as "Processes" in GEOS. But they are for all intents and purposes the same. GEOS is single tasking, and so these timer–based processes should not be confused for processes in a multi–tasking OS. Here's the first page of Chapter 9, Process Support.
The general idea is that in your app, you build a static table which consists of 4 bytes per entry. A 2–byte pointer to a routine to call, and a 2–byte (16–bit) counter. You can have as many timers as you want, up to 255 maximum, although it's possible the limit is somewhat smaller.
At some point, presumably early in the application's lifecycle, you call initProcesses, passing in a pointer to the start of the table and the number of processes in the table. InitProcesses then maintains some hidden variables to keep track of the internal state of each of the timers.
You interact with the timers and their hidden state via a set of system calls. Each system call works the same way, the only argument is the timer number (the index into the original table of timers). These calls consist of:
- EnableProcess, and the odd one out
Sleep is not like the others. They crammed it into this chapter on Process Support, but it doesn't seem to have much to do with the initialized table of timers. You call it in the middle of a routine, it backs up the return address, to be called later, and takes a 16–bit countdown. When the countdown expires it jumps back to that return address. It's asynchronous with the rest of the code and the context of the routine after the sleep is totally different than it would have been if you never called sleep. Let's just leave this one aside.
Once you've initialized the processes, they are largely out of your control. The IRQ service routine counts the counters down every 60th of a second and when one of them reaches 0 a flag is set internally on that timer. The MainLoop checks the status of all the timers and if any of them has timed out, its callback routine is called.
The various system calls allow you to (of any given timer) pause the countdown, reset the countdown, or allow the countdown to keep counting but skip actually calling the callback when it hits 0. And that is about it. It's pretty simple.
The only thing I'm not sure about is whether it's possible to call initProcesses again on a new table in the middle of the application's lifecycle. It might be possible, but even if it is, it would mean reinitializing every timer at once. Also, when you initProcesses, the static table does not hold any status flags, so every timer must start in the same condition. If you want to initialize a timer but immediately block it, you'd have to make a separate BlockProcess call on each timer.
I'm not sure how long support for timers has been around, but something tells me they were pretty early. There are 4 essential functions, although two of them are the same and can be used interchangeably.
- clearInterval (an alias to clearTimeout)
You can read about setTimeout here.
You can call clearTimeout and pass in the timer reference number. This will cancel that timer and prevent it from calling the callback. After a timer expires and its callback is called, or after you manually clearTimeout, the timer is dequeued.
setInterval is very similar to setTimeout with one small difference. After its timer expires, it does not dequeue automatically. The timer is reset to the original value and begins counting down again. The rhythm of firing is not perfectly regular or reliable, because when it fires can be delayed by stuff going on that prevents the thread from returning to the main event loop.1
Let's Talk About Timers
Let's take a step back for a moment and revisit where I left off a year ago at the end of the previous post about timers.
My first approach was to only support a single timer. My proposed "solution" for applications that need multiple timers was to have some sort of timer multiplexor that could be implemented in application space (rather than being permenantly resident in the C64 OS system code). My second approach stuck to the idea of providing only one timer, but abandoned the idea of using the JiffyClock as the basis.
I also originally began with this rather zany queued offsets idea. I was fixated on the need for efficiency. What if you have a hundred timers? Imagine you know how they're spaced out. Offsets in seconds of: 10, 18, 25, 32, 42, 45, etc. You don't have to count down all one hundred timers and wait for them each to reach 0. You can count down the first one only. When it hits zero you can subtract its value from the second one, so 18 has 10 subtracted from it, so it ends up at 8, and starts counting down from there. When that expires, you subtract the total of this from the next one. So 25 has 18 subtracted from it, and starts counting down from 7. Etc. At any time you are only ever ticking down just one timer.
It makes a lot of sense. It really does. And it sounds like an efficient way to handle a million timers. Just worry about the current one, the one that is to go off next. And when it goes off, do a bit of math to figure out when the next one will go off and then switch to worrying about only it. It's brilliant, I'm sure I read about that somewhere.
There are two main issues. C64 OS doesn't need to support a hundred timers, let alone a million! (But it needs more than 1, as we'll see shortly.) And things get real complicated, real fast, when you try to insert a timer into a queue of pre–existing ones. Or, if you try to remove a timer from the queue, especially when timers can be paused. The extra code required to make such a system work, considering the fact that in practice any given app may only actually need two or three timers makes such extreme efforts towards efficiency completely insane.2
What GEOS does is just counts down each timer, using two bytes per timer to track them individually. When any one of them hits zero, that timer fires, and has its own two bytes reset back to their original values to begin counting down again. I quickly realized that from a code complexity / speed efficiency tradeoff perspective, the GEOS way is a much better way to go.
Why Are Timer Events Useful?
I have returned to the original idea of having well defined timer structures and a real timer system that is always resident, and that manages a lot of the work for you. Why have I gone back to that?
Over the last couple of months I've put a huge amount of effort into getting the menu system working the way I want it to. And I've got planned a full post that goes into the nitty details of menus. Suffice to say, I'm very pleased with how they work. They load dynamically from a human–editable menu definitions file, they're hierarchical, they've got realtime rollovers, they stay open even when you roll off them, they've got delegated state for enabled/disabled (aka greying out) and active/inactive (aka a checkmark beside an entry). They have full support for keyboard shortcuts, with modifier–key combinations and custom characters to symbolize the modifier keys. And they're fast. What else is there to ask for?
Well, there is something else to ask for. On macOS when you trigger a menu item with a keyboard shortcut the root menu item (in the menu bar) blinks. This is a usability feature that provides feedback to the user to let him or her know that the keyboard shortcut actually triggered something that can be found somewhere under that menu, even though that menu is closed. The menus in Windows on PC, at least in my testing, do not do this. As a Mac guy, I find it mildly distressing and somewhat disorienting to use a keyboard shortcut, have something happen, but never see the feedback of the menu blink. The classic example is saving a document. On Windows you press CTRL+S and the document saves, but the menu doesn't blink. On the Mac, you hit Command–S and when you see that File menu blink you get this warm comforting feeling that you know it worked.
I implemented this usability feature from the Mac into the menu system in C64 OS, and I swear to you it took about 20 bytes of assembled code to do it. For just 20 bytes, that feels like a big win. But, it requires a timer to do it. The menu system already has recursive code that searches the menu hierarchy looking for a matching keyboard shortcut. As it moves from root menu to root menu it stashes a pointer to the current root menu being searched. If it gets to the end and never finds a match, that variable is nulled out. When it comes time to send the menu entry action code to the app, it makes one extra system call to mark itself as needing a clean (efficient, only affecting itself) redraw. Whenever the root menu's draw code is called, if the current root menu it's drawing matches the pointer in that variable, it changes the background color to the "open" color, but then clears that variable and sets a 0.2 second timer to make a system call for another clean redraw. When the timer expires, the menubar is marked to draw itself again. But this time, the variable has been unset and so it redraws it with the standard background color. Net result? When you activate a menu with a keyboard shortcut, the root menu entry highlights, and then unhighlights automatically a 5th of a second later. It works perfectly, it looks great, the usability is (in my opinion as a Mac guy) brought way up. And it literally took around 20 bytes to implement. Plus two extra clean redraws are necessary, but they're negligible because they're so fast.
But, this only works if you have access to timers. This is easy only if it's easy to make a timer. Here's the thing though, if C64 OS only provides one timer, then, how would this work? The menus need a timer, but they're not even part of the application. They're part of the system. What happens if the application is already using the "one timer" provided by the system? The menus would then be stuck without access to a timer. It's not good enough that the application be able to implement its own timer multiplexor. The menu system isn't going to know about the app's custom multi–timer solution.
GEOS's ability to have multiple timers is hampered by their central registration. How can a random desk accessory use timers and the app use timers, and system code use timers if the timer definitions all have to come from the same place? No, we need the ability to independently queue and manage timers from anywhere.
So now let's see how we get the best of all of these requirements.
As I said in the introduction to this post, I've learned a lot. I went back to the screen module recently and decided to do some refactoring. The screen compositor supports 4 layers, the uppermost layer is permenantly occupied by system menus. The lowest layer is always occupied by the primary UI, which gets pushed onto the compositor stack when the application is initialized. There are then two more layers in between. This could be for a modal dialog, or also for a service app, as previously touched on. Plus you've got one more layer, because, who knows, maybe a modal dialog opens a service app, or maybe a service app needs to open its own modal dialog. But that's it. That's the limit, 4 compositing screen layers.
Each layer consists of a structure of 8 bytes, four 2–byte pointers. So, how shall the screen module manage these layers? Originally, like an idiot, I made a 32–byte buffer, to be used as an array big enough to hold all the bytes of all the layers. Pushing a layer consisted of copying 8 bytes from app space into the appropriate place in the buffer and incrementing a pointer by 8 bytes so it points to the next empty spot in the buffer.
This is actually not that great an idea.
A C64's memory space is unprotected. The operating system is single tasking. There is no need to copy memory around. It is ultimately just slower, wasteful of space, and less flexible.
What's the alternative? As a general principle this applies to screen layer structures, timer structures, event queues and more, if you have the structure in memory and pass a pointer to it to queue it, just queue the pointer. The pointer is only 2 bytes, and these can be split and stored in two 1–byte arrays for even easier pushing and popping. When you need to read the data from the structure, copy the pointer from the queue to a zero page address, and bingo, you're using indirect indexed addressing to read and manipulate the data in the structure.
But it's even better than that, queuing the pointer to the original structure has even more advantages because the structure itself remains back in the memory space of the code that produced it. The application code can modify the original structure… while it's queued! Very powerful. Plus it saves memory because you're not copying it anywhere. In the case of the screen layers, as long as the application has direct access to the original structure, it could redirect the handler for printable key events, on the fly. Or shift the draw routine between a splash screen and a Toolkit–based UI, or between two different Toolkit–based UIs. The options are endless, incredibly powerful and flexible.
So, how does this work for timers?
Above is a C64 OS timer struct, along with the status flag definitions. The structure consists of a 3–byte countdown, a byte for status flags, a 2–byte pointer to the callback routine (called the trigger routine), and a 3–byte countdown reset value.
The countdown itself is 3 bytes, little endian (least significant byte first), measured in jiffies (60ths of a second). In GEOS a countdown is only 2 bytes for a maximum of 18 minutes (without doing extra work). The 3 bytes in C64 OS provides a countdown of ~3.2 days.3 3.2 days isn't merely more than 18 minutes, I feel like it crosses a critical threshold of usability. For example, I almost never leave my Commodore powered on for more than a full day. So under normal circumstances I'll never reach the maximum capacity of a 3–byte countdown. But with a limit of 18 minutes you can't even set a timer for 1 hour, even though I frequently sit in front of my Commodore for many hours at a time. The advantage you get by having 3 bytes rather than 2, is much greater than any advantage you would get by having 4 bytes rather than 3. Ergo, 3 feels like the magic number.
The status byte holds all the control and state flags for the timer. There are no hidden properties to timers in C64 OS. We'll return to describe status flags in a moment.
The 2–byte pointer to the callback routine is pretty straightforward. It just points to the routine to call when the timer expires.
Lastly there is a 3–byte reset value. This comes last because depending on how the timer is setup it can be optional, so you can save 3 bytes by just leaving it off. If a timer is configured as an interval timer, it needs to have its countdown reset so it can countdown again. This reset value is copied back to the main timer value.
Returning the status flags now. These flag definitions are subject to change, of course, as I'm still working on them. But at the moment the following are defined:
- Paused: The timer will not countdown, even though it remains in the queue.
- Interval: When the timer expires, it will not be dequeued, it will remain in the queue to fire again. The timer will be reset automatically.
- Cancel: The timer will be dequeued at the earliest opportunity.
- Reset: The timer will be reset to the reset value.
- Expired: The expired flag gets set automatically when the timer counts down to zero. The count itself gets reset to the reset value, and countdown will continue. The Main Event Loop uses the expired flag to know that it should call the trigger routine. The expired flag is automatically unset after the trigger routine is called.
The Timer module has 3 exported routines, but only one of those ever needs to be called by application code.
In an application, to use a timer, you define a 9–byte timer structure (optionally only 6 bytes if the reset value is not needed). Then do a JSR to timeque with the pointer to the timer in .X and .Y. Here's how that looks in code:
We start with a simple routine: borderchg. It increments register $20, the border color register, of the VIC-II and then returns. Next we have a timer struct, labelled: bordertmr. The first three bytes are the timer, 60 jiffies for a one second countdown. The status byte is set to tintrvl so this timer will not dequeue after expiring, but continually reset and countdown again. Next is the 2–byte pointer to the trigger routine. In this case that pointer is to the borderchg routine we just saw. And finally the timer's reset value which is 60,0,0 again, so subsequent iterations of the timer continue to reset to 60 jiffies.
Lastly, at the routine labelled starttmr, we use the macro #ldxy to load a pointer to the timer struct into the .X and .Y registers. And then jmp to timeque.4 In the code screenshot above, you can see a bit of the Timer module's timer queue. It consists of two 1–byte arrays: timerlo and timerhi, plus a variable timeridx that is the array index. timeque merely adds the pointer to our struct to timerlo and timerhi at timeridx and then increments timeridx. If the array is full, the carry is set to indicate an overflow error. Otherwise the carry is clear to indicate the timer was successfully queued.
From the application's standpoint, there is nothing more to do. Every second the borderchg routine will be called. But let's see what happens internally.
The reason the timer module exports two other routines, timedwn and timeevt, is because the IRQ service routine is provided by the service module and the Main Event Loop is provided by the screen module. The IRQ service routine calls timedwn, thus timedwn is called 60 times a second. It decrements the 3–byte timer of all queued timers, as long as their pause flag is unset. When the timer reaches 0, the expired flag is set. If the interval flag is set then the reset value is copied to the timer value. And the timer struct is not dequeued by timedwn regardless.
The Main Event Loop calls timeevt.
Aside: The Main Event Loop is effectively an infinite loop that just cycles repeatedly as fast as it possibly can. (The 6502/6510 CPU doesn't have a wait feature the way modern power–conserving CPUs have.) This may seem inefficient because, the event loop only ever branches out if there is an event to deliver, whether keyboard, mouse or timer–based. However, events can only possibly be generated by the IRQ service routine, which itself only runs 60 times a second. There are a ~million cycles per second, which means there are ~16,666 cycles between each IRQ. The IRQ service routine itself will use a handful of these cycles couting down timers, scanning the keyboard, updating the mouse, etc. However, if a given instance of the IRQ service routine doesn't produce any events, then the remaining 16,000 some–odd–cycles will be spent pointlessly cycling through the Main Event Loop. I say pointless because, until the next IRQ there is no possible way for an event to be generated. After the Main Event Loop checks for events the first time and doesn't find any (or finds some but they get handled quickly) it seems stupid to keep cycling. But, that's just how it is, because there is nothing else for the CPU to do. Modern CPUs, at this point, would enter a low–power wait state until the next IRQ, which wakes them up again.5
So the Main Event Loop calls timeevt, which checks to see if any timer's expired flag is set. If it is, timeevt handles calling the trigger routine. After calling the trigger routine, if the interval flag is low, then the timer gets removed from the queue. It has to be timeevt that handles dequeuing the timer and not timedwn. Because, an expired non–interval timer has to remain queued long enough for the Main Event Loop to see its expired flag and call its trigger routine.
Realtime Timer Interaction
In GEOS you create a table of timers as a 2–byte routine pointer and 2–byte countdown. After you've initialized them, you can call system routines to change internal status flags on those timers. But all of the state is internal, so you are limited in what you can modify by the system calls that are provided. You can pause a timer, or block or freeze a timer. Or you can undo those, you can unpause, unblock and unfreeze. And there is no such thing as dequeuing, if you don't want a timer to be fired anymore you can just leave it paused indefinitely.
But, what if you want to just slow down a timer? What if you want to change its callback routine? What if you want to check its progress to see how long you have until it expires? You basically can't.
You have a long list of items with a search box at the top. As the user types in the search box you want to filter the items in the list to only those matching what's typed in the box. You want to do it nearly in realtime, you don't want to make the user press a search button explicitly. But, it's too computationally expensive to do the filtering on every keystroke. Instead, at the first keystroke a search timer is begun, say, a 5th of a second long. On each subsequent keystroke you want to reset the timer. As long as the user types quickly, each new key resets the timer and prevents the search from beginning. But at the end of typing, an imperceptibly brief time after the last keystroke, the search begins automatically.
In C64 OS, the timer's state is exposed to the code that created it. The Timer module itself doesn't keep any secret details about the timer at all, just a pointer to the original struct.
If you want to pause a timer, you can do so by setting the pause flag on the status byte.6 If you want to cancel and dequeue a timer, you can set the cancel flag on the status byte. But you can do a lot more than that. If you want to know how much longer the timer has to go before it expires, you can just read the timer value. If you want to add some arbitrary amount of time to a pre–existing timer, that's already counted down some, no problem. Just add to the timer value. You can change the trigger routine by just changing the pointer on the original struct to point to some other routine.
If the timer is an interval, do you want to slow it down or speed it up? No problem, just change the reset value. After it expires, it'll reset to the new value, to either a longer or shorter countdown than the last one.
Don't worry about the cost of timers until you're creating 100K's of them. broofa — StackOverflow — December 3, 2016
GEOS can't have more than 255, because initProcesses takes an 8–bit number for how many are in the table.
C64 OS's Timer module has a table of fixed length. At the moment, that length is 5. So it can handle 5 timers queued simultaneously. This number could be increased, at a cost of just 2 bytes of system memory per additionally supported timer, up to a max of 255. I picked 5, because, in my experience, 5 is enough. My original idea of just 1 is definitely not enough. The menus need to use one to blink. Some other system code will probably use timers for other simple animations and transitions. A service app, such as an actual timer, could use one, still leaving your app free to use a couple more.
So, only 5 C64 OS timers can be running concurrently, but far more than 5 can be used. For example, the Menu system has the menu blink timer struct resident at all times. But it only queues it into the Timer module when a keyboard shortcut is used. It gets queued, counts down for 12 Jiffies, fires, and is immediately dequeued. If all 5 slots were occupied when a keyboard shortcut was used, the menu would attempt to queue the menu blink timer, but it would fail to queue. The menu would highlight, but would fail to unhighlight until the next redraw that was triggered for some other reason. That's what would happen.
I think I've finally nailed Timers for C64 OS. Here's the summary:
- Queue timers individually at any time
- Up to 5 timers concurrently queued
- ~3.2 day maximum countdown
- 60th of a second precision
- Interval or one–shot mode
- Pause without dequeuing
- 100% inspectable/updatable
I'm already dogfooding by using them in system code. So I'll know pretty soon whether I need to make any adjustments.
As always, thoughts, comments, corrections and questions are welcome, in the comments below.
- This is another example of a false efficiency, as discussed in Memory Manager Development and again in Rethinking the Memory Map. [↩]
- Each byte is 256. The total size is therefore 2563 or ~16.7 million. But this is measured in 60ths of a second. So divided by 60 is 279,620 seconds, divided by 60 is 4,660 minutes, divided by 60 is 77 hours, divided by 24 is 3.2 days. [↩]
- Note that it JMPs to timeque rather than JSRing. That's because this is the end of the routine, after the JSR the only thing left to do is RTS. Any time a JSR is immediately followed by an RTS, the pair can just be replaced by a JMP to accomplish the same thing using less code and less stack. [↩]
- Garth Wilson over at Wilson Mines Co., has many great articles on the 6502. He's got an article about the differences between the NMOS 6502, found in the VIC-20, and the CMOS 65c02 created by Western Design Center, but which was never used in a Commodore computer. The 65c02, among many improvements, has a "wait" state, triggered by executing the WAI instruction. [↩]
- You should always disable interrupts while modifying a queued timer. Otherwise, very bad things could happen. [↩]