Sunday, 26 October 2008

Feeling for a Pulse

So, in the .NET 2.0 release, Microsoft deprecated the Suspend() and Resume() methods that existed to help simplify thread synchronisation. All for good reasons, I hasten to add - Suspend() was a highly-dodgy bit of code that took no notice of what the thread to be suspended was doing when it was halted, which meant that if it was in the middle of a non-atomic operation you could easily end up with half-complete updates, and locks held all over the place. Not nice.

Anyway, my first-pass CPUCore class made use of these methods as a quick-and-dirty way of giving me control over thread execution, but I knew I'd have to do the decent thing at some point and replace them with the approved mechanisms provided by the Monitor object. So this weekend I sat down and read everything I could find that talked about how to use the Lock, Monitor.Wait and Monitor.Pulse methods to reproduce Suspend() and Resume() in thread-safe code. And although it was a steep learning-curve, the end result is a surprisingly neat bit of code that does what it's supposed to, and which doesn't spit compiler warnings at me.

This is pretty-much the main emulation core loop. I've ripped-out a load of incidental stuff, leaving just the skeleton it all sits on - so you can see that we just spin through the loop forever (until something clears the IsRunning flag) either executing instructions or, if the core is paused, sitting on Monitor.Wait until we get a pulse to tell us to start again:

// Loop forever, until the flag is cleared to indicate thread disposal
while(_isRunning)
{
switch(_runMode)
{
case CoreRunMode.RunPaused:
lock(this)
{
Monitor.Wait(this);
}
break;

case CoreRunMode.RunContinuous:
case CoreRunMode.RunStepped:
ExecuteInstruction();
break;
}
}

And outside of the core, this bit of code executes when we want the core to resume after a pause:

// Resume core execution
public void CoreResume()
{
lock(_core)
{
_core.RunMode = CPUCore.CoreRunMode.RunContinuous;
Monitor.Pulse(_core);
}
}

Again, all this work is essentially invisible outside of the emulation object, but is necessary both from an aesthetic point of view and for making maintenance easier in the future. As a final snippet for this post, compare how the STA (Indirect Indexed, Y) instruction has evolved as I've iteratively refined the code:

Initial:
case 145: // STA Indirect Indexed,Y
address = (ushort)((data.memory.Peek(ops[0] + 1) * 256) + data.memory.Peek(ops[0]));
data.memory.Poke(address + data.Y.Contents, data.A.Contents);
break;
Interim:
case 145: // STA Indirect Indexed,Y
data.memory.Poke(data.memory.Deek(data.memory.Peek(data.PC.Contents + 1)) + data.Y.Contents, data.A.Contents);
break;
And now:
case 145: // STA Indirect Indexed,Y
_mem[_mem[(long)_mem[PC + 1]] + Y] = A;
break;
It's a bit cryptic, where before it was more readily understandable, but then we've traded legibility for speed - the use of the indexer on the memory array is considerably quicker than going through discrete Peek() and Poke() methods. Also, the memory structure itself is now a 'top-level' entity as far as the CPU Core object is concerned, instead of being nested inside the larger aggregated data structure that holds a lot of other CPU-state information.

2 comments:

t0ne said...

Cool. I like the indexer you've implemented, it seems like a really nice way of solving the problem. Shame about the readability, as you say but then the performance gains sound worth it. Are you using multiple threads in your emulation core or just one for the emulation and another for the UI/user input or something else ?

Jonners said...

Yeah, the readability issue is a problem, but the performance gain is a real kick - I'll be publishing the new metrics in a few days, and it's pretty clear that the trade-off is worth it.

The core runs as a single thread, wrapped in a control object you instantiate from a host process. At the moment that host is just my commandline monitor, but the same object will eventually be tucked inside a GUI that does the 'smart' assembly this whole project is designed to make possible.

Equally, you could have the object wrapped in a machine emulator, essentially any 6502-based device - I use the VIC-20 as the test harness. And of course there's nothing stopping you instantiating multiple cores if you wanted to...