Wednesday, 3 December 2008

Bitrot

"Bit rot, or bit decay, is a colloquial computing term used either to describe gradual decay of storage media or to facetiously describe the spontaneous degradation of a software program over time. The latter use of the term implies that software can literally wear out or rust like a physical tool."
I found an odd glitch when I got back into the code last weekend - everything was ticking-along nicely, until I entered the 'l' command into the monitor to dump the memory allocation table to screen. The software paused unexpectedly, and then Visual Studio kicked-in with a nasty 'Index out of range' exception and pointed me at the MemoryMap class. "What the hell...?", I muttered.

This code hasn't changed for a while now, and has been entirely stable since the last chunk of work I did on it. Moreover, I've hardly touched the project for three weeks, and it was working fine the last time I did anything with it. And anyway, the changes I'd just made for the Register classes went nowhere near the MemoryMap class. Some sort of weirdness was occurring.

Looking at the stack trace and other debug information, I could see that MemoryMap was being asked by the ShowMemoryAllocation routine to return a MemoryCell object for location 65536. This was obviously the cause of the 'Index out of range' exception, as the map is initialised as a 64K area with addresses in the range 0-65535, so anything outside of this (which should never occur!) is going to give the .NET runtime a headache. The big question was, therefore, how and why was location 65536 being requested?

Digging into ShowMemoryAllocation, I could see that this code, too, had not changed in some time. With no obvious place to focus, I restarted the monitor with a breakpoint set, and re-tried the command. As I single-stepped through the code, I hit this little bit of logic:
for (int i = 1; i < memSize; i++)
{
// Get the memory cells
MemoryMap.MemoryCell thisCell = _memory[(long)i];
MemoryMap.MemoryCell prevCell = _memory[(long)i + 1];
...
}
What this is doing is asking MemoryMap for the current memory cell (as we loop through the whole array) and also the previous one. This lets us see if the type or assigned purpose is different on the current cell, and thus spit out a summary message for the previous cell type. Now, can you see the error?

That's right - the reference to prevCell is being set as the MemoryCell at the current location PLUS one. So that would be the NEXT cell, not the PREVIOUS one - and when the current cell reference is location 65535, this asks for location 65536 and we blow the array upper bound. It's pretty obvious that prevCell should be asking for the cell at the current location MINUS one, and the loop this code is in starts at location one - that is, I designed this to handle the lower bound situation as well as the upper bound limit. So why the hell am I asking for the next location instead of the previous one?

I thought it must have just been a typo on my part, perhaps a moment of insanity as I was falling ill the last time I was in the codebase. But looking back through the Subversion logs, I saw that this line of code had not changed since I wrote it - and the initial version DID correctly look at the previous location. There was no update that changed the sign of the operation to a positive!

So, somehow, in the three weeks that the code lay dormant and I coughed and sneezed myself half to death, bitrot started to set in and that minus sign decayed to a plus. A single bit decays and flips the one next to it, and we have ASCII code 43 instead of 45; and ShowMemoryAllocation suddenly starts doing something dumb.

It's working now, of course - with the sign changed back to a minus, the whole routine bursts into life and does what it's supposed to. C'est la vie...

1 comments:

The Cowboy Online said...

Isn't it amazing how, just a few days away from your code, things like this can happen? I've looked back on code I've written months previously, and at the time I thought I'd commented it rather well, only to find it is damn near impenetrable.

Personally I think all source code should sit inside a single, huge, source file. Just like the good old days, like on the Vic. ;-)