<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5115611155063132884</id><updated>2011-07-05T15:18:52.532+01:00</updated><category term='Negative'/><category term='Pipeline'/><category term='Instructions'/><category term='Branch'/><category term='Monitor'/><category term='6502'/><category term='Stella'/><category term='Bit'/><category term='DASM'/><category term='IDE'/><category term='Overflow'/><category term='Refactoring'/><category term='Clock Speed'/><category term='Labels'/><category term='Flags'/><category term='GUI'/><category term='C#'/><category term='Assembler'/><category term='Byte'/><category term='Indexer'/><category term='Deek'/><category term='Bitrot'/><category term='ALU'/><category term='Int'/><category term='VIC-20'/><category term='Memory'/><category term='Registers'/><category term='Risk'/><category term='Exceptions'/><category term='Cycles'/><category term='ML'/><category term='Emulator'/><category term='Zero'/><category term='Metrics'/><category term='Carry'/><category term='Disassembly'/><title type='text'>Legacy System</title><subtitle type='html'>The diary of a 6502 emulator written in C#</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>27</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-9089778426853756039</id><published>2009-01-29T10:25:00.003Z</published><updated>2009-01-29T10:41:31.285Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='IDE'/><category scheme='http://www.blogger.com/atom/ns#' term='Disassembly'/><category scheme='http://www.blogger.com/atom/ns#' term='GUI'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><title type='text'>Dirty Pretty Thing</title><content type='html'>&lt;div style="text-align: justify;"&gt;Real life has a nasty habit of unravelling the best-laid of plans, and the last couple of weeks have seen me with very little time to spend on the project - but I &lt;span style="font-style: italic;"&gt;have&lt;/span&gt; done a little as the opportunity has arisen, so as promised here's a screenshot of what the IDE looks like right now. It's not as far along as I would have liked, but it's getting there - and there's another panel (the memory-dump view) which is nearing completion but not shown, so progress is slow but steady:&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;a href="http://picasaweb.google.com/lh/photo/zfytuRiT9XALfPPLgoX0xg?feat=embedwebsite"&gt;&lt;img src="http://lh4.ggpht.com/_k7S257jPIsM/SYGEHo8obDI/AAAAAAAAATY/V7nU-_SqJbg/s400/IDE%2029012009.png" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;As you can see, we have a disassembly panel (which does what it says on the tin) and an address select/convert tool which lets me convert addresses from Hex to Decimal (and back) and also makes use of the annotation labels (if present) to find and select addresses by name. There's a little more I want to do to this tool, but it works so far.&lt;br /&gt;&lt;br /&gt;In the meantime, when I've not been actively coding, I've been mulling-over the second version of the core, which emulates the 6502 at the T-state level. I've got a reasonable idea of how to make it work, so once I've got the IDE where I want it, I'll be in a position where I can instantiate both the current V1 core and the new V2 core side-by-side, and watch the behaviour of each simultaneously as they step through instructions. Which should be cool. :)&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-9089778426853756039?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/9089778426853756039/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=9089778426853756039' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/9089778426853756039'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/9089778426853756039'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2009/01/dirty-pretty-thing.html' title='Dirty Pretty Thing'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_k7S257jPIsM/SYGEHo8obDI/AAAAAAAAATY/V7nU-_SqJbg/s72-c/IDE%2029012009.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-8268348751859564479</id><published>2009-01-16T15:05:00.002Z</published><updated>2009-01-16T15:15:39.719Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='IDE'/><category scheme='http://www.blogger.com/atom/ns#' term='Pipeline'/><category scheme='http://www.blogger.com/atom/ns#' term='Metrics'/><title type='text'>He Who Fights And Runs Away...</title><content type='html'>&lt;div style="text-align: justify;"&gt;OK, I concede temporary defeat. Pipelining is proving too large a drain on the progress of the project - it's working, and performance is better than it was after the first attempt, but it's still nowhere near as good as the non-pipelined version. So, for now, this goes to the back-burner and I'm returning to the IDE. I have a pretty good idea for how to do some branch prediction and 'loop unrolling' which will make the pipeline quicker, but that's another week or two's effort; I really don't want to get bogged-down in this and stall the project further, so a tactical retreat is in order.&lt;br /&gt;&lt;br /&gt;Accordingly, I've reverted the codebase to the pre-pipeline version and rolled-in the few bugfixes I'd made elsewhere in the code as I implemented and tested the pipeline constructs. So we're now back to where we were a fortnight ago, and the pipelined version is tucked away waiting for the day I return to it. In the meantime, on with the IDE!&lt;br /&gt;&lt;br /&gt;In a few days I'll post some screenshots so you can see how it's shaping up.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-8268348751859564479?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/8268348751859564479/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=8268348751859564479' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/8268348751859564479'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/8268348751859564479'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2009/01/he-who-fights-and-runs-away.html' title='He Who Fights And Runs Away...'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-2129459203987173689</id><published>2009-01-13T09:54:00.005Z</published><updated>2009-01-13T11:40:00.898Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Memory'/><category scheme='http://www.blogger.com/atom/ns#' term='Byte'/><category scheme='http://www.blogger.com/atom/ns#' term='Pipeline'/><category scheme='http://www.blogger.com/atom/ns#' term='Branch'/><category scheme='http://www.blogger.com/atom/ns#' term='Instructions'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><title type='text'>Unblocking The Pipes</title><content type='html'>&lt;div style="text-align: justify;"&gt;Getting pipelining working is now approaching the point at which it's becoming a major distraction from pushing the project forward, and I'm thinking I'm going to try one last thing before giving it up as a bad job and reverting to the non-pipeline version of the code. I've fixed a couple of other little bugs I've found whilst I've been fiddling with the instruction fetch logic, so lifting those fixes out and dropping them into the non-pipeline version will be a matter of seconds.&lt;br /&gt;&lt;br /&gt;I spent a good while making use of JetBrains' "dotTrace" to identify hotspots in the pipeline code - as expected, there was some redundant object creation going on which was easy to nullify, and I changed a byte-by-byte copy from one array to another with the Array.Copy method; but apart from that there wasn't much to do in the way of performance tuning - there isn't much code to tune, after all, as it's pretty simple. And the big performance hit remained the over-read of memory triggered by the frequent pipeline flushes.&lt;br /&gt;&lt;br /&gt;So it was time to do something a little more radical, to try to minimise the number of times the pipeline got flushed. A flush occurs when an instruction causes the PC to change; the PC increments by the instruction length after each instruction anyway, but by 'change' I mean that it's set to another value instead of the natural increment. This occurs after a BRK, BxS, BxC, JMP, JSR, RTS or RTI instruction when the PC is reloaded with the address to continue execution from. We also do pipeline 'top-ups' when we've consumed more than 50% of the pipeline contents, but the frequency of that happening is dictated by pipeline length, and is a negligible cost compared to the flush.&lt;br /&gt;&lt;br /&gt;The trick to mitigating flushes is to try to arrange matters so that the new PC value is one that is already in the pipeline. For a lot of 6502 code this can be achieved without having a huge pipeline; branches, for example, can only be -128/+127 locations from the PC. And JSR instructions tend to be to relatively close-by code, often within the same page (256 bytes) which means the corresponding RTS could easily be within the reach of the pipeline too. JMP instructions tend to be to farther-off locations, and RTI by its very nature could be returning from a long way off. BRK occurs infrequently as an instruction, and when it does occur it'll likely be doing a very long jump indeed, and anyway will route through the NMI vector so there's no pipeline optimisation that'll help with that.&lt;br /&gt;&lt;br /&gt;So the first thing to do is make the pipeline big enough to cope with the common scenarios - branch instructions and JSR/RTS combinations. We don't want a huge pipeline, because although we're trying to reduce flushes, whenever we DO flush we'll be refilling that pipe - and the bigger it is, the longer it'll take to fill. Equally, a FIFO queue pipeline isn't a smart idea if we want to accomodate 'negative' changes to the PC and jump backwards; as we consume instructions from the FIFO queue, they are lost, so negative jumps would demand a flush.&lt;br /&gt;&lt;br /&gt;I've written a pipeline that is 512 bytes long, arranged as two complete pages of 256 bytes either side of the PC; when a flush occurs, we fill the pipeline with 256 bytes from behind the new PC value, and 256 bytes from ahead of it. This gives us a pretty high chance that any jump or branch forwards or backwards will be into the pipeline, and because it contains data from behind the PC we'll have a moving window as we do top-ups that will also keep the chances high that a negative jump will also still be in the pipeline too. For this to work properly, we also have to stop popping instructions out of the pipeline as we execute them; we need to be able to jump back to an instruction we've already processed for negative jumps to have a chance of being satisfied by what's in there. So instead of a FIFO queue, the new pipeline is a simple array with a pointer maintained by the emulator code; when we flush the queue, the pointer is set to 256 and we execute from there. As we proceed through the pipeline, we leave the contents intact but modify the pointer; branches and jumps that can be satisfied from the pipeline are manifested by simply changing the pointer to the appropriate pipeline slot.&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;Although this sounds like a second Program Counter (and in a sense it is) the key thing to remember is that the pipelined data is right there in the CPUCore class, and doesn't require a context switch to get data from MemoryMap every time we fetch and decode an instruction. There are still inline MemoryMap references, such as when an indirect load, store or jump occurs; in those circumstances, the operand data in the pipeline is a pointer to a memory location which in turn contains the real target address, and we have to read that address out of memory before we can operate on it. But the majority of instructions and addressing modes can be satisfied directly from the pipeline. At least, that's the theory.&lt;br /&gt;&lt;br /&gt;I've made the changes to the pipeline structure and logic in CPUCore, and right now I'm just finishing the change to MemoryMaps' BurstData method, so that the pipeline is filled with 256 bytes from either side of the current PC when we ask for data. The minor complication here is that the data should 'wrap' when the PC is less than 256 or greater than [Memory.Size - 256], so once I add that little bit of functionality we'll be good to go. Hopefully, the ability to jump and branch within the pipeline without necessarily triggering a flush will make all the difference.&lt;br /&gt;&lt;br /&gt;If that doesn't deliver the desired results, the next thing to do will be to make pipeline population an asynchronous task on another thread - but I've already decided that I'm not going to persue that at this time, and it can wait until I write the V2 CPUCore after Easter because that will be a multi-threaded entity anyway. Abandoning the pipeline scheme (if I have to) won't be a total waste of effort, as I've learned a lot both about pipeline mechanics and 6502 code profiling - and that is good foundation knowledge for the V2 core.&lt;br /&gt;&lt;br /&gt;Anyway, we'll see.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-2129459203987173689?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/2129459203987173689/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=2129459203987173689' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/2129459203987173689'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/2129459203987173689'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2009/01/unblocking-pipes.html' title='Unblocking The Pipes'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-1402921772329732872</id><published>2009-01-08T08:36:00.005Z</published><updated>2009-01-09T12:11:55.456Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Byte'/><category scheme='http://www.blogger.com/atom/ns#' term='Pipeline'/><category scheme='http://www.blogger.com/atom/ns#' term='Branch'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='VIC-20'/><category scheme='http://www.blogger.com/atom/ns#' term='Clock Speed'/><title type='text'>Blocked Pipes</title><content type='html'>&lt;div style="text-align: justify;"&gt;I couldn't resist it. Despite the knowledge that I really should be working on the IDE, I had to see what effect there would be on the core execution speed if I implemented a rudimentary pipelining mechanism into the instruction decode routine.&lt;br /&gt;&lt;br /&gt;So over the last couple of nights, I've added a small 'BurstData' routine to the MemoryMap class, which populates a pre-defined shared array with the next 'n' bytes of data from the primary memory array and makes it available to the CPUCore class. I then added a FIFO queue to CPUCore (using the System.Collections Queue class) and wrote the logic to call BurstData when the queue was less than half-full, enqueuing data from that shared array. Tweaking the fetch code to pull its data from the queue instead of memory was a matter of seconds to change, and then I went through all the 6502 instruction implementations adjusting them to use queued data instead of memory where appropriate.&lt;br /&gt;&lt;br /&gt;Once that was done, I set the queue capacity to six bytes and the refill threshold to three bytes, and fired it up. And the results are stunning.&lt;br /&gt;&lt;br /&gt;Stunningly bad, that is - where I was getting 40MHz on the older box, I now get 10MHz.&lt;br /&gt;&lt;br /&gt;I experimented with some other configurations of the queue; after 6/3 proved disasterous, I played with 9/6, 12/3, 12/6, and even some silly values like 30/15. Performance varied, but in general either stayed in the region of 10MHz or dropped - the worst was just 4.1MHz. Clearly, this is Not Working.&lt;br /&gt;&lt;br /&gt;So why is this? Well, the basic problem is that whenever the Program Counter changes due to a Branch, Jump, Break or Return instruction, the queue has to be flushed and re-populated from the new PC location. And a characteristic of 6502 code is that routines and loops tend to be short, so the queue gets flushed a lot. Since we read 'n' bytes ahead to populate the queue, we're now doing a lot MORE memory access than ever before, and in the majority of cases it's a complete waste of effort. Look at this example from the VIC-20 ROM, which executes during the boot process and is checking to see if a short sequence of bytes is present in memory which would indicate that a plug-in auto-start cartridge has been connected:&lt;br /&gt;&lt;/div&gt;&lt;pre&gt;&lt;span style="font-family:courier new;"&gt;; CHKA0CBM: FD3F  [2]  A2 05     LDX #$05       (- Check For A-ROM)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;;           FD41  [4]  BD 4C FD  LDA $FD4C,X&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;;           FD44  [4]  DD 03 A0  CMP $A003,X&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;;           FD47  [2]  D0 03     BNE $FD4C&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;;           FD49  [2]  CA        DEX&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;;           FD4A  [2]  D0 F5     BNE $FD41&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;;           FD4C  [6]  60        RTS&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;div style="text-align: justify;"&gt;This is a perfect example of the nature of the problem; let's assume we have a queue set to 6 bytes long, with a refill threshold set to 3. When we jump to this routine at $FD3F, the queue is flushed and we read 6 bytes of memory from the PC location into it. Instruction execution commences, reading the first byte (LDX) and then the second as the operand. We then read the third byte (LDA), and two more as operands. On the next fetch, the queue level has dropped to one byte, so a refill occurs and puts five further bytes into the queue. We then execute the next instruction (CMP), pulling its' two bytes of operand data off the queue; the queue level is now three bytes, which is not lower than the refill threshold, so we execute the next instruction (BNE) and pull the operand byte off the queue too.&lt;br /&gt;&lt;br /&gt;Now at this point (in our example) the branch is taken - the CMP instruction didn't find a match and we're skipping out of the loop. So the PC changes to $FD4C, and the queue is flushed and populated with six bytes. We execute the first instruction, which is RTS, and that also changes the PC to set it back to the address this routine was called from; so the queue is flushed again, and we read another six bytes. In total, to execute this loop in the pre-pipeline model, we would read 12 bytes of memory (11 in the routine, plus the first instruction after the calling JSR when we RTS); with the pipeline enabled, we read 6+5+6+6 (flush and fill the queue, refill the queue, flush and fill, flush and fill) or 23 bytes. Discounting how many bytes we make use of after the RTS, we've definitely read 6 bytes more than we needed during execution of the routine.&lt;br /&gt;&lt;br /&gt;That doesn't sound so bad, but the problem is endemic - every time we hit a Branch or Jump, even if the target address is only a few bytes away, we flush and refill the queue. And longer queues just exacerbate the problem; if we were to simply double the queue length and the refill threshold (thinking that having more data to hand with fewer refills might improve the situation) we'd read 12+8+12+12 bytes, or 44 altogether - which doesn't help at all!&lt;br /&gt;&lt;br /&gt;So I need to have a bit of a re-think on this. I have some ideas:&lt;br /&gt;&lt;br /&gt;1. Ensure the queue manipulation logic is as fast as it can be - there might be some optimisation I can do to the mechanics of the code.&lt;br /&gt;&lt;br /&gt;2. Tweak the PC-changing instructions to look and see if the target address is ahead by just a few bytes, and don't flush the queue if so.&lt;br /&gt;&lt;br /&gt;3. Abandon the FIFO queue and use a hand-rolled routine to manage a pipeline in which data isn't popped-off the top when consumed, but instead we have a pointer that we can move up and down - good for short loops where the branch is a negative one.&lt;br /&gt;&lt;br /&gt;4. Put queue population on a separate thread, so that it handles flushes and refills concurrently to the core instruction fetch/decode logic.&lt;br /&gt;&lt;br /&gt;5. Replicate 20-odd-years-worth of industrial CPU research and design a mechanism that does pipeline caching and branch prediction with a high degree of accuracy.&lt;br /&gt;&lt;br /&gt;I'm not giving-up on this yet, because I think I can crack it. The first attempt has been hugely disappointing, but also very informative, and I'm pretty confident that I can improve upon my first design in enough ways that it evolves from a massive drag on performance into something that improves it.&lt;br /&gt;&lt;br /&gt;I'll keep you posted.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-1402921772329732872?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/1402921772329732872/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=1402921772329732872' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/1402921772329732872'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/1402921772329732872'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2009/01/blocked-pipes.html' title='Blocked Pipes'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-3829822639362544947</id><published>2009-01-05T08:54:00.005Z</published><updated>2009-01-05T12:45:46.316Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='IDE'/><category scheme='http://www.blogger.com/atom/ns#' term='Memory'/><category scheme='http://www.blogger.com/atom/ns#' term='Byte'/><category scheme='http://www.blogger.com/atom/ns#' term='Pipeline'/><category scheme='http://www.blogger.com/atom/ns#' term='Instructions'/><category scheme='http://www.blogger.com/atom/ns#' term='GUI'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><title type='text'>New Years' Resolution</title><content type='html'>&lt;div style="text-align: justify;"&gt;Well, here we are in 2009 - in London it's cold, dark and unpleasant outside, and of course Christmas is now just a fading memory as we return to the daily grind. Travel expenses have risen, and things seem pretty gloomy all round; the come-down after the yuletide high is a bitch. But it's not all bad, as the need to get my brain back into work-mode has also stimulated me into returning to the project after two or three weeks away from it during the festive period. My initial target was to get a stable core emulation by Christmas, which I achieved; my New Year Resolution is to have the whole core complete, together with the IDE, by Easter.&lt;br /&gt;&lt;br /&gt;Of course, being something of a geek (surely not!) I did spend a few idle minutes over the holidays thinking about Sharp6502. Naturally my mind wandered from the task in hand, and instead of thinking about GUI coding for the IDE, my thoughts turned back to the CPU Core and entertained some ideas on how to speed it up.&lt;br /&gt;&lt;br /&gt;It's running at about 75MHz at the moment, which is plenty fast enough for my requirements; I don't need to make it any faster at this juncture, and in fact the ideas I've had for it are staying on the 'Future Enhancements' list for a good while because I want to focus on the IDE until it does at least as much as the command-line interface. But even though it's not a priority, a geeks' mind loves to tinker and tweak. :)&lt;br /&gt;&lt;br /&gt;The biggest bottleneck in the way this version of the core runs is memory access - every instruction needs between one and three accesses to get the OpCode and any requisite operands, plus another access or two if the addressing mode is indirect, and there may then be further reads and/or writes if the instruction is one which does any memory access itself. The core only accesses memory when the instruction demands it, so we read one byte initially (for the OpCode) and then do further accesses only if the instruction decode requires one or two operands, any indirection access, and then whatever memory access is actually performed by the instruction.&lt;br /&gt;&lt;br /&gt;All well and good - but memory access is expensive, relatively, because it requires the core instruction decode logic to make requests of the MemoryMap class and is thus a branch out of the sequence of IL instructions that are being executed by the CLR as it emulates each 6502 mnemonic. So my mental meanderings began to wonder if there was any way to reduce the number of times that happened - or at least to reduce the number of times we had to break the flow of IL instructions to fetch data, even if the number of accesses remained the same.&lt;br /&gt;&lt;br /&gt;Of course, Intel (and AMD, and probably others) were thinking about this many years ago, which is why we have pipelined CPUs in our PCs today; to put it in simple terms, when a pipelined CPU reads an instruction byte it also reads-ahead a few bytes (perhaps &lt;span style="font-style: italic;"&gt;quite&lt;/span&gt; a few, in current processor families) and fills a pipeline with that data. As the instruction executes, its operands will, if it's lucky, be available in the pipeline instead of having to do another memory access. Equally, successive instructions may also be in the pipeline, so you could end up executing several instructions very quickly from the pipeline instead of doing memory accesses for each. Filling the pipeline is a fast operation, because it's a burst of sequential data from memory, and can happen concurrently with the first instruction being decoded; even if it takes a bit longer than a straight-forward memory access for the next byte or two, you still recoup that time as subsequent instructions and data come from the pipeline instead of memory. Coupled with on-chip cache and efficient pipelining and branch-prediction algorithms, this has a huge impact on execution speed. The only time the model falls apart is when the pipeline is populated with the wrong data (perhaps because the branch-predictor got it wrong) and needs to be re-populated; the trick is to try to minimise the number of times this happens, and a great deal of time and effort has been invested in doing just that.&lt;br /&gt;&lt;br /&gt;Funnily enough, even the humble 6502 had a primitive version of pipelining; the next byte would often be read whilst instruction decode was still occuring on the previous byte, so there was a shorter wait-state when that next byte was needed. In CPUCore version 2 (which I've alluded to in previous posts, and which will emulate the core at a lower level) this simple read-ahead will be faithfully replicated as part of the fetch logic. I started to wonder if a pipeline mechanism of sorts might exhibit some interesting performance gains even in the current version, and worked-out a way to do it.&lt;br /&gt;&lt;br /&gt;What we'll have is a six-byte pipeline implemented as a FIFO queue; when the first OpCode fetch happens, we'll actually return six consecutive bytes from memory instead of just one. The instruction decode will then use the first pipeline entry, and any successive operand(s) will also be fed from there. As each byte is consumed, the pipeline queue will pop it off the top; when there's only three bytes left in the queue, we'll initiate another memory access to fill it to capacity again. Any instruction that causes a non-incremental change to happen to the PC (i.e. Bxx, JSR, JMP, and BRK) will of course incur a full pipeline refresh; but all other instructions will get a performance boost as they read their operands from the pipeline instead of memory. Also, subsequent instructions will more often than not be available in the pipeline, so we'll score a bonus on the instruction fetch as well.&lt;br /&gt;&lt;br /&gt;In a best-case situation, the pipeline will be filled with six one-byte instructions; we'd do a memory access to get the pipeline filled, and then the next two instructions are cost-free. With the queue now down to 50% full we initiate another memory access to get three bytes; so we're essentially doing one MemoryMap access for every three instructions executed. In the real world, of course, we'll see instructions varying from one to three bytes and that theoretical 3:1 ratio will drop; but we'll still execute at better than the current 1:1 level. And all this is without doing anything at all sophisticated about branch prediction...&lt;br /&gt;&lt;br /&gt;Why six bytes for the pipeline length? Purely as a starting-point, and knowing that an instruction can be one, two or three bytes long, I took the worst-case length and doubled it so that the pipeline could potentially hold two complete three-byte instructions. Once the logic is proved to be viable, I'll experiment with other multiples of three to see whether there's a 'sweet spot' at which the pipeline length performs better. I'll also play with the pipeline-fill trigger, so we can modify the circumstances under which a full or partial pipeline refresh will occur - the first cut will top-up the pipeline when it drops to 50% full, and completely repopulate it when PC does something other than increment by one. But it already feels like there might be ways of tuning or improving that.&lt;br /&gt;&lt;br /&gt;It'll be fun playing with it to see what happens. ;)&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-3829822639362544947?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/3829822639362544947/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=3829822639362544947' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/3829822639362544947'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/3829822639362544947'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2009/01/new-years-resolution.html' title='New Years&apos; Resolution'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-2152551603866258979</id><published>2008-12-18T11:50:00.003Z</published><updated>2008-12-18T12:33:34.636Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='IDE'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='GUI'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><title type='text'>What's in a Name?</title><content type='html'>&lt;div style="text-align: justify;"&gt;The Sharp6502 IDE continues to take shape - I've been chugging along writing WinForms code for the last couple of days, getting things like standard menu-handling and form control logic in place. It's all been pretty straight-forward, aside from one little glitch that had me stumped for a while.&lt;br /&gt;&lt;br /&gt;The scenario is this: we want a nice, clean, efficient bit of code that can be called by any menu item (on the main MDI form) that needs to open a child window. By passing such a method an identifier, the menu Click event can tell it which of a variety of child windows it is asking to be opened. Thus, by making the identifier a string in the menu item Tag field, the exact same window-opening code can be called by any menu item; the actual window that opens will be whichever one the identifier indicated. There's probably a better way of doing this, but it's a technique that's reasonably easy to understand and maintain, and has served me well for the last decade or so.&lt;br /&gt;&lt;br /&gt;Anyway, in some cases we might not want to open a window if one already exists - a lot of the child windows can be open multiple times, but just occasionally we want a single instance of a particular one. So the window-open code has a loop to scan through the extant child windows, and it makes a note if an instance of the window it's being asked to open is already present. Then when we come to actually create and show the window, for those that must be single-instance we check the flag to see if there already is one, and just activate it if there is (or create it if there isn't).&lt;br /&gt;&lt;br /&gt;To figure-out if a window is already present, we simply compare the names of all the forms in the MDIChildren collection with the identifier the menu item gave us - those identifiers are just the form name the menu item wants to open, so we can very easily tell if there's already one in existence. And it's this little test that had me scratching my head for a good half-an-hour last night, because although it worked for most of the windows, there was one it just would not play ball with - I never managed to get a match on name, so even though I only wanted a single instance, I always got another window opened. Highly confusing, and not a little irritating.&lt;br /&gt;&lt;br /&gt;I carefully checked that the form name I had stashed as an identifier in the menu item Tag was right - it was. I checked that the identifier made it through to the instance-checking code intact and unmangled in some peculiar way - it did. I made sure that the comparison was checking Form.Name against the identifier - it was. I looked at what was in Form.Name in the cases where it worked - it matched. I looked at the contents of Form.Name in the case where it didn't work - and it &lt;span style="font-weight: bold;"&gt;didn't&lt;/span&gt; match! WTF?&lt;br /&gt;&lt;br /&gt;Somehow, the form name (which should have been reported as 'frmXXX') was actually coming back as just 'XXX'. So my identifier, containing 'frmXXX', didn't match, and the application then went on and created a new window. How the hell was my forms' name getting broken?&lt;br /&gt;&lt;br /&gt;Well, I checked every damn thing I could see, and played around with the code for a quarter of an hour or so, before getting to that 'grasping at straws' stage when, having eliminated the impossible, whatever remains (however improbable) must be the cause. The only difference between the 'problem' form and the ones that worked was that it had a different border style. The others are all the usual 'Sizable' type, but this one is 'FixedToolWindow', because it's a little helper-utility that exists as a single instance. And unbelievably, when I changed it to have a 'Sizable' border, the Form.Name came back as 'frmXXX'.&lt;br /&gt;&lt;br /&gt;So for some arcane reason (and there must &lt;span style="font-style: italic;"&gt;be&lt;/span&gt; a reason, surely?) the WinForms engine chops the 'frm' prefix off the form name if it's got a 'FixedToolWindow' border style. I haven't tried this with other prefixes, because I need to press on - but if anyone out there knows what the cause of this behavior is, I'd be very interested to know. And yes, I know the use of Hungarian Notation for objects is now frowned-upon and I shouldn't be using 'frm' anyway - but until the Visual Studio Solution Explorer has a way of grouping objects so that my forms and other stuff are separated-out from each other and I can readily see which things are forms, and which are standard classes, I'm going to keep on defying convention.&lt;br /&gt;&lt;br /&gt;Anyway, another few days should have the IDE running with a Disassembler view, so I'll post a screenshot or two at that point. Stay tuned!&lt;br /&gt;&lt;br /&gt;EDIT: Just after I posted this, I re-ran the code and it looks like the naming glitch might actually have been caused by something else; when I switched the problem form back to 'FixedToolWindow', it still reports it's name as 'frmXXX' - so something got broken deep in the form engine system somewhere, and changing the border style fixed it. It's still wierd, but evidently not a fundamental problem. Hey ho.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-2152551603866258979?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/2152551603866258979/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=2152551603866258979' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/2152551603866258979'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/2152551603866258979'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/12/whats-in-name.html' title='What&apos;s in a Name?'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-7491464982880351028</id><published>2008-12-12T14:33:00.006Z</published><updated>2008-12-12T15:11:07.402Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='IDE'/><category scheme='http://www.blogger.com/atom/ns#' term='Registers'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='Instructions'/><category scheme='http://www.blogger.com/atom/ns#' term='GUI'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='VIC-20'/><category scheme='http://www.blogger.com/atom/ns#' term='Monitor'/><title type='text'>Well, That Was Easy</title><content type='html'>&lt;div style="text-align: justify;"&gt;Cracked-on with the project last night, tackling those tasks in my list from the previous post. First was to re-visit every line of code in the instruction decode/execution logic, to make sure it all looked good and did the right things in the best way. There were a few places where I could tweak stuff - mostly things like removing bit-masking code on 16-bit registers (either because the contents were being moved to an 8-bit register, or because I could replace the 16-bit register with an 8-bit one altogether) and occasionally re-sequencing things to eliminate temporary variables. Overall, it was in pretty good shape, and is now about as good as I can make it in its' current incarnation (i.e. as a basic select..case switchblock). Version 2 of the core, still some way off, will do this whole thing a very different way, but for now it suffices.&lt;br /&gt;&lt;br /&gt;Second on the list was to convert the core classes into a DLL that the monitor would talk to, instead of having the actual classes included as part of the monitor project. In the past, before C#, .NET, and Visual Studio, this would have been quite a sizable task. It was certainly something I was not looking forward to doing; but once I knuckled-down and got stuck into it, it turned out to be a walk in the park. Add a new Project to the monitor Solution, choose Class Library (the new name for DLL), move all the core-related classes over to it, set the dependancy in the monitor project to use the Class Library, and build it. Seriously, a task I thought was going to take two or three hours ended-up taking about 30 minutes, including a couple of tests beforehand to see how it was going to work out.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;In doing that, I got the third item on my list for free. Making the monitor use the DLL instead of included class files was all wrapped-up in making the DLL itself - by the time I had the DLL built, and almost before I realised it, the monitor project was using it. Again, something I thought was going to be an hours' work was actually done without even really trying. So I have to shout out a big Thank You to Microsoft for making this an almost trivial task!&lt;br /&gt;&lt;br /&gt;So we now turn to item four, which is to create a GUI replica of the commandline monitor. So far, I have an MDI window and some menus defined, because I got that far and then decided to call it a night and dive into World of Warcraft for a while. But it's a start, and considering I wasn't expecting to start it for a few days yet, I'm a happy bunny.&lt;br /&gt;&lt;br /&gt;Now you might be thinking that it's a good time for me to check all the existing emulator code into Google Code, but I'm not going to just yet, and here's why: although it all works well as far as it goes, it's still a country mile away from being ready for release (even just alpha preview release). The instruction support is at less than 50% of the official opcode set, because as you know I've just been implementing instructions as the VIC-20 ROM throws them at me; that means that not every instruction has been encountered, and for those that have we've only seen a subset of their addressing modes. Equally, none of the undocumented opcodes have been seen yet, and few of the systemic 6502 glitches are wired-in either (like the JMP page bug). So I'm not content to have World+Dog looking at it yet, because it's just not good enough in my eyes.&lt;br /&gt;&lt;br /&gt;Instead, content yourself with a VERY early alpha screenshot of the IDE - so early, in fact, that it's almost mind-numbing in its' dullness. But at least you can see progress. :)&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;a href="http://picasaweb.google.com/lh/photo/iwrxXTA2MX9Py02EjQEuRA"&gt;&lt;img src="http://lh3.ggpht.com/_k7S257jPIsM/SUJ-mSsXl3I/AAAAAAAAASo/Vwfmqwf2sQE/s400/IDE%2011122008.png" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-7491464982880351028?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/7491464982880351028/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=7491464982880351028' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/7491464982880351028'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/7491464982880351028'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/12/well-that-was-easy.html' title='Well, That Was Easy'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_k7S257jPIsM/SUJ-mSsXl3I/AAAAAAAAASo/Vwfmqwf2sQE/s72-c/IDE%2011122008.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-2541519918040984716</id><published>2008-12-11T08:14:00.003Z</published><updated>2008-12-11T08:21:49.207Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='Instructions'/><category scheme='http://www.blogger.com/atom/ns#' term='VIC-20'/><category scheme='http://www.blogger.com/atom/ns#' term='Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='Clock Speed'/><title type='text'>Achievement Unlocked</title><content type='html'>&lt;div style="text-align: justify;"&gt;A milestone was achieved as I worked on the code last night - there came a point when I entered 'g' (as in 'go', the command to start the CPU) into the monitor, and nothing happened.&lt;br /&gt;&lt;br /&gt;Now this might sound like a disaster, but it actually means that the emulator is at the stage where it allows the VIC-20 ROM to get as far as the IRQ-dependant stuff. In other words, the ROM is running in a loop waiting for an interrupt to stuff something into the keyboard buffer, and therefore the core did not spit an 'Unable to decode the following opcode byte' message at me, which is what it does when the ROM gives it an unimplemented instruction. So, nothing happened. :)&lt;br /&gt;&lt;br /&gt;If I were writing an actual VIC-20 emulator, this is the point at which I'd starting thinking about coding some logic to represent the 6522 VIA chips which generate timed interrupts and suchlike, in order to progress the overall machine. However, I've never intended to write such an application - loyal readers will know that I was only ever using the VIC-20 ROM as a test to get the emulated core off the ground. So with that done, we're fast approaching the point at which I'll be jumping-off into deeper waters, essentially configuring my own ROM to test instructions that have no code written for them yet.&lt;br /&gt;&lt;br /&gt;But before I do that, I have a few other tasks to do:&lt;br /&gt;&lt;br /&gt;1. Revisit the instruction decode logic to make sure everything is neat, tidy and efficient.&lt;br /&gt;2. Repackage the code as a DLL so I can disconnect it from the commandline monitor.&lt;br /&gt;3. Rework the commandline monitor to use the DLL version of the core.&lt;br /&gt;4. Write a basic (but extensible) GUI replica of the commandline monitor.&lt;br /&gt;&lt;br /&gt;Once we get to item 4, I'll have an IDE that talks to the core DLL and gives me the basis of what this project is all about - a fancy development environment in which I can configure 6502-based systems as I desire (using either pre-canned 'maps' like the VIC-20 one I have now, or creating all-new maps for sample 'machines' that don't exist in reality) and write code for them, and that has all the bells and whistles I want such as in-place instruction timing as you write the code. And the main reason for repackaging the code as a DLL instead of just embedding the classes in the host application is so that other environments (e.g. emulators) can plug it in if they wish...&lt;br /&gt;&lt;br /&gt;For a chuckle, I ran a 'release' build version outside the Visual Studio environment to see what clockspeed I'm getting these days. You might recall I had some early success in optimising the logic and saw speeds around the 40MHz mark. Then things took a bit of a nosedive whilst I was doing some major reworking, and those speeds tumbled to around 50% of their best - no better than 18-20MHz on average. I've done a lot of work since then, and refactored a fair amount of the Register class logic, to name but one significant change.&lt;br /&gt;&lt;br /&gt;Today, on a slightly creaky Dual-Core Pentium box, I see 42MHz. On the altogether sexier Core2Duo rig I do all my development and gaming on, the code yields 72MHz. That'll do. ;)&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-2541519918040984716?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/2541519918040984716/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=2541519918040984716' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/2541519918040984716'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/2541519918040984716'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/12/achievement-unlocked.html' title='Achievement Unlocked'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-9035433437828578293</id><published>2008-12-10T10:01:00.005Z</published><updated>2008-12-10T12:00:10.866Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='ALU'/><category scheme='http://www.blogger.com/atom/ns#' term='Carry'/><category scheme='http://www.blogger.com/atom/ns#' term='Negative'/><category scheme='http://www.blogger.com/atom/ns#' term='Instructions'/><category scheme='http://www.blogger.com/atom/ns#' term='Zero'/><category scheme='http://www.blogger.com/atom/ns#' term='Flags'/><title type='text'>Nothing Compares 2 U</title><content type='html'>&lt;div style="text-align: justify;"&gt;Progress, of a sort. Flushed with success after cracking the intricacies of the ADC and SBC instructions, I decided to conduct a small experiment - using my Sharp6502 monitor alongside the VICE monitor to single-step through the VIC-20 startup sequence, and see how well they matched-up. Obviously, I'd be looking for a perfect score here, as any differences would indicate something wrong with my code - and although VICE isn't a &lt;span style="font-style: italic;"&gt;perfect&lt;/span&gt; emulation (there are some very specific VIC-related issues it doesn't handle) it's generally accepted to be pretty reliable when it comes to 6502 instruction execution, which means that if my CPU core does something different, it's almost certainly my code that's wrong and not VICE. With all my 6502-based machines in storage at the moment, I can't do a 'real' hardware comparison, so VICE is a good second-best.&lt;br /&gt;&lt;br /&gt;So, on Sunday evening I got both VICE and Sharp6502 sitting alongside each other on-screen, and started them both at the cold-start vector ($FD22) in the VIC-20 ROM. And we got precisely eight instructions into the code before I noticed something - the 'CMP Absolute,X' opcode at $FD44 set the Negative flag (also known as the 'sign' flag) differently in my code. VICE said N was set, and Sharp6502 said it was clear. The other two flags affected by CMP (Carry and Zero) were in agreement, but N was most definitely not. So I stopped the test, and had a look at the CMP implementation, which was one of the first I'd done ages ago - if you remember, I was implementing instructions as the ROM presented them, so this was a bit of code that dated back almost to the first version of the CPU core.&lt;br /&gt;&lt;br /&gt;Well, I looked long and hard at that code, and couldn't for the life of me see how it was going wrong. In fact, I even started to doubt VICE at one point, because the preceding instruction is an LDA whose operand is #$CD, which of course sets the N flag, and I began to wonder if VICE was somehow forgetting to reset N when it did the CMP afterwards. I pulled-up all the documents I could find that talk about the CMP instruction, and it looked like I was doing exactly the right thing. An example of what various sources say about CMP is:&lt;br /&gt;&lt;pre&gt;&lt;span style="font-size:85%;"&gt; CMP - Compare [Z,C,N = A-M]&lt;br /&gt; This instruction compares the contents of the accumulator with another value&lt;br /&gt; and sets the zero and carry flags as appropriate. Processor Status after use:&lt;br /&gt;&lt;br /&gt; C     Carry Flag            Set if A &gt;= M&lt;br /&gt; Z     Zero Flag             Set if A = M&lt;br /&gt; I     Interrupt Disable     Not affected&lt;br /&gt; D     Decimal Mode Flag     Not affected&lt;br /&gt; B     Break Command         Not affected&lt;br /&gt; V     Overflow Flag         Not affected&lt;br /&gt; N     Negative Flag         Set if bit 7 of the result is set&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;Seems pretty straightforward, right? What could possibly go wrong with that? Well, after 36 hours of painful research, experimentation, and banging of the head on the desk, I can tell you exactly what can go wrong with it.&lt;br /&gt;&lt;br /&gt;Y'see, it's like this. CMP is actually a pretty powerful instruction, because it does both magnitude and equality tests in one go; that is, on the basis of a single CMP you can tell whether the Accumulator contains a value the same as, bigger, or smaller than the value you're comparing it with. That's pretty clever, and is also where the devious little 'gotcha' lives that I'd not been cognisant of before this. The equality test sets (or clears) the Z flag, so that if the difference between the Accumulator and the test value is zero (i.e. they are equal) the flag is on. The 'bigger than' test sets the Carry flag, by just comparing the two values and setting the flag if the Accumulator is larger (or equal to, but that distinction is clarified by the Z flag).&lt;br /&gt;&lt;br /&gt;But the N flag, which is the 'smaller than' test result indicator, is a tricksy little fiend that depends upon the natural ability of an 8-bit register to 'wrap around' when the value drops below zero. The key to this is recognising that little annotation in the instruction description for what it is, and what it means: [Z,C,N = A-M] tells us that the flags are set as a consequence of subtracting the memory (test) value from the Accumulator. And THAT means that although the Accumulator is left unchanged by CMP, it is actually doing an unsigned subtraction in the ALU to determine the result, and &lt;span style="font-style: italic;"&gt;that&lt;/span&gt; means that subtracting a value of 2 from  a 1 in the Accumulator, for example, gives us a result of 254 and not -1.&lt;br /&gt;&lt;br /&gt;My result logic used an ordinary INT variable to do the calculation, which (using the 1-2 example just now) gave me -1 as the result, and of course Bit 7 wasn't set. Using a temporary 8-bit register as the result field, however, means that the wrap-around occurs and I get 254 as the answer, and Bit 7 IS set. Here's the code for 'CMP Absolute,X' - all the other addressing modes, as well as CPX and CPY, follow the same pattern:&lt;pre&gt; case 221: // CMP Absolute,X&lt;br /&gt;   _tempResult = _mem[_mem[-++_PC.Contents] + _XR.Contents];&lt;br /&gt;   _TR8.Contents = _AC.Contents - _tempResult;&lt;br /&gt;   _SR[_BIT0_SR_CARRY] = _AC.Contents &gt;= _tempResult ? 1 : 0;&lt;br /&gt;   _SR[_BIT1_SR_ZERO] = _TR8.Contents == 0 ? 1 : 0;&lt;br /&gt;   _SR[_BIT7_SR_NEGATIVE] = _TR8[_BIT7_SR_NEGATIVE];&lt;br /&gt;   _PC.Contents++;&lt;br /&gt;   break;&lt;/pre&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;We put the test value into _tempResult because it's used twice, and I don't want to hit MemoryMap more than once. You can see that we set Carry by doing a simple magnitude test between the Accumulator and the test value. Having subtracted the test value from the Accumulator and stored the result in a temporary 8-bit register (_TR8) we then check to see if the answer was zero, and set Z accordingly. Finally, we simply set N to whatever Bit 7 of the temporary register contains, as it will have wrapped-around if the result dropped below zero.&lt;br /&gt;&lt;br /&gt;To close, here's a snippet of text cribbed from &lt;a href="http://6502.org/tutorials/compare_beyond.html"&gt;www.6502.org&lt;/a&gt;s' tutorial on the CMP functionality of the CPU - it turned out to be the definitive and most helpful document I could find on this subject, and of course it's well-worth a mooch around the rest of the site for more information on all things 6502. Enjoy!&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;The N flag contains most significant bit of the of the subtraction result. This is only occasionally useful.  However, it is NOT the signed comparison result, as is sometimes claimed, as the following examples illustrate:  &lt;/p&gt;&lt;p&gt; After:  &lt;/p&gt;&lt;pre&gt;    LDA #$01 ;  1 (signed),   1 (unsigned)&lt;br /&gt;    CMP #$FF ; -1 (signed), 255 (unsigned)&lt;br /&gt;&lt;/pre&gt;  A = $01, C = 0, N = 0 (the subtraction result is $01 - $FF = $02), and Z = 0. The comparison results are: &lt;ul&gt;&lt;li&gt;Equality comparison: false, since $01 &lt;&gt; $FF&lt;/li&gt;&lt;li&gt;Signed comparison: 1 &gt;= -1&lt;/li&gt;&lt;li&gt;Unsigned comparison: 1 &lt; 255&lt;/ul&gt; After:  &lt;pre&gt;    LDA #$7F ;  127 (signed), 127 (unsigned)&lt;br /&gt;    CMP #$80 ; -128 (signed), 128 (unsigned)&lt;br /&gt;&lt;/pre&gt;  A = $7F, C = 0, N = 1 (the subtraction result is $7F - $80 = $FF), and Z = 0. The comparison results are:  &lt;ul&gt;&lt;li&gt;Equality comparison: false, since $7F &lt;&gt; $80&lt;/li&gt;&lt;li&gt;Signed comparison: 127 &gt;= -128&lt;/li&gt;&lt;li&gt;Unsigned comparison: 127 &lt; 128&lt;/ul&gt;  Notice that in both cases the signed comparison result is the same (the first number is greater than or equal to the second), but the N flag is different.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-9035433437828578293?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/9035433437828578293/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=9035433437828578293' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/9035433437828578293'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/9035433437828578293'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/12/nothing-compares-2-u.html' title='Nothing Compares 2 U'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-7628327676770740944</id><published>2008-12-05T21:05:00.005Z</published><updated>2008-12-05T21:48:35.037Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Overflow'/><category scheme='http://www.blogger.com/atom/ns#' term='ALU'/><category scheme='http://www.blogger.com/atom/ns#' term='Carry'/><category scheme='http://www.blogger.com/atom/ns#' term='Instructions'/><title type='text'>ALU John, Gotta New Motor?</title><content type='html'>&lt;div style="text-align: justify;"&gt;Over the last couple of days I have been concentrating on getting those pesky ADC and SBC instructions coded and working. Quite probably the most complex and subtle of the 6502 instructions, these two are the gateway into the really serious bit of silicon at the heart of the chip - the ALU, or Arithmetic Logic Unit.&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;It's this bit of the processor that is responsible for being able to do the very basic mathematical operations involved in addition and subtraction, and for keeping track of situations when the results of those operations exceed the 8-bit capacity of the Accumulator (which is the register involved). In cases where a 'carry' is needed to indicate a continuation of the calculation, or an overflow occurs when a result won't physically fit into the Accumulator, the ALU has the task of setting and/or reading flags in the PSR that will influence a calculation or report on it's outcome. In short, ADC and SBC are complicated little so-and-so's, and it's been a real headache getting them to a satisfactory point.&lt;br /&gt;&lt;br /&gt;In fact, it's the first time since that silly VIC-20 Kernal ROM loop problem I had way back that I've needed to ask for help. There's no shame in that - these instructions have tested many a 6502 programmer and emulator writer, just because of their subtleties. Apart from any other considerations, these instructions are the only ones that work differently if the processor is in Decimal mode, so we're effectively implementing not two but four instructions here, as there are two code paths through each. Add to that the fact that there's quite a lot of information out there on the Net that talks about things like 'What the V flag does', but that quite a number of those information sources contradict one another, or in some cases are actually downright wrong, and you can begin to understand how frustrating it can be to get the details right.&lt;br /&gt;&lt;br /&gt;Anyway, after a bit of head-scratching and some guidance from the good people at &lt;a href="http://www.yakyak.org/"&gt;YakYak &lt;/a&gt;and &lt;a href="http://sleepingelephant.com/ipw-web/bulletin/bb/index.php"&gt;Denial&lt;/a&gt;, I finally got to a place where I think the logic for these instructions is right - they add and subtract, handle the Carry flag, and (I think) set the Overflow flag when appropriate - in both Binary and Decimal mode. Now, ironically, I will probably have to revisit them in the future as there are a couple of bugs in the real CPU that I'm not emulating yet - some flags are set incorrectly in Decimal mode, for example - but for now they seem to be doing the right things, and I'm moving onwards. Wanna see what the ADC code looks like in it's finished form? Here you go:&lt;pre&gt;case 105: // ADC Immediate&lt;br /&gt;_memTemp = _mem[++_PC.Contents];&lt;br /&gt;_TR.Contents = _AC.Contents + _memTemp + _SR[_BIT0_SR_CARRY];&lt;br /&gt;if (_SR[_BIT3_SR_DECIMAL] == 1)&lt;br /&gt;{&lt;br /&gt;  if (((_AC.Contents ^ _memTemp ^ _TR.Contents) &amp;amp; 0x10) == 0x10)&lt;br /&gt;  {&lt;br /&gt;    _TR.Contents += 0x06;&lt;br /&gt;  }&lt;br /&gt;  if ((_TR.Contents &amp;amp; 0xf0) &gt; 0x90)&lt;br /&gt;  {&lt;br /&gt;    _TR.Contents += 0x60;&lt;br /&gt;  }&lt;br /&gt;}&lt;br /&gt;_SR[_BIT6_SR_OVERFLOW] = ((_AC.Contents ^ _TR.Contents) &amp;amp; (_memTemp ^ _TR.Contents) &amp;amp; 0x80) == 0x80 ? 1 : 0;&lt;br /&gt;_SR[_BIT0_SR_CARRY] = (_TR.Contents &amp;amp; 0x100) == 0x100 ? 1 : 0;&lt;br /&gt;_SR[_BIT1_SR_ZERO] = _TR.Contents == 0 ? 1 : 0;&lt;br /&gt;_SR[_BIT7_SR_NEGATIVE] = _TR[_BIT7_SR_NEGATIVE];&lt;br /&gt;_AC.Contents = _TR.Contents &amp;amp; 0xff;&lt;br /&gt;break;&lt;/pre&gt;This afternoon I burned through another batch of instructions as the monitor spat 'Unable to decode' messages at me when it stumbled across VIC-20 ROM instructions the core hadn't seen before. So we can now do some ASLs and ROLs, as well as PHP and PLP. To finish, here's a recap on that instruction roll-call I posted a while back:&lt;br /&gt;&lt;pre&gt;&lt;span style="font-family:courier new;"&gt;ASL Zero Page            executed: 4&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;PHP Implied              executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ORA Immediate            executed: 73&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ASL Accumulator          executed: 2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ORA Absolute             executed: 51&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;BPL Relative             executed: 1162&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ASL Zero Page,X          executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;CLC Implied              executed: 3232&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;JSR Absolute             executed: 7513&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;BIT Zero Page            executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ROL Zero Page            executed: 16&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;PLP Implied              executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;AND Immediate            executed: 168&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;BMI Relative             executed: 7&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SEC Implied              executed: 5&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;EOR Zero Page            executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LSR Zero Page            executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;PHA Implied              executed: 97&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;EOR Immediate            executed: 2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;JMP Absolute             executed: 59&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LSR Zero Page,X          executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;CLI Implied              executed: 25&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;RTS Implied              executed: 7509&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ADC Zero Page            executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;PLA Implied              executed: 97&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ADC Immediate            executed: 138&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ROR Accumulator          executed: 4096&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;JMP Indirect             executed: 25&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SEI Implied              executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;STY Zero Page            executed: 22&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;STA Zero Page            executed: 343&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;STX Zero Page            executed: 20&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;DEY Implied              executed: 1045&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;TXA Implied              executed: 7214&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;STY Absolute             executed: 4&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;STA Absolute             executed: 23&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;STX Absolute             executed: 10&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;BCC Relative             executed: 3173&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;STA Indirect Indexed,Y   executed: 20533&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;STY Zero Page,X          executed: 48&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;STA Zero Page,X          executed: 291&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;TYA Implied              executed: 49&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;STA Absolute,Y           executed: 33&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;TXS Implied              executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;STA Absolute,X           executed: 540&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDY Immediate            executed: 65&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDX Immediate            executed: 41&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDY Zero Page            executed: 49&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDA Zero Page            executed: 7411&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDX Zero Page            executed: 57&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;TAY Implied              executed: 50&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDA Immediate            executed: 13396&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;TAX Implied              executed: 7199&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDY Absolute             executed: 2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDA Absolute             executed: 7&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDX Absolute             executed: 24&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;BCS Relative             executed: 4178&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDA Indirect Indexed,Y   executed: 7254&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDY Zero Page,X          executed: 6&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDA Zero Page,X          executed: 53&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDA Absolute,Y           executed: 32&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LDA Absolute,X           executed: 108&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;CPY Immediate            executed: 2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;CPY Zero Page            executed: 3&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;CMP Zero Page            executed: 156&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;INY Implied              executed: 57&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;CMP Immediate            executed: 148&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;DEX Implied              executed: 214&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;BNE Relative             executed: 18974&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;CMP Indirect Indexed,Y   executed: 11264&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;CLD Implied              executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;CMP Absolute,X           executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;CPX Immediate            executed: 53&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SBC Zero Page            executed: 3&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;INC Zero Page            executed: 7220&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;INX Implied              executed: 313&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;SBC Immediate            executed: 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;NOP Implied              executed: 27&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;BEQ Relative             executed: 7403&lt;/span&gt;&lt;/pre&gt;You can also see how many of each instruction the core has executed as it tore through the ROM and initialised the (fictitious) VIC-20. Not bad, eh?&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-7628327676770740944?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/7628327676770740944/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=7628327676770740944' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/7628327676770740944'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/7628327676770740944'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/12/alu-john-gotta-new-motor.html' title='ALU John, Gotta New Motor?'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-7382265634306924219</id><published>2008-12-03T08:30:00.005Z</published><updated>2008-12-03T09:10:31.343Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Exceptions'/><category scheme='http://www.blogger.com/atom/ns#' term='Bitrot'/><category scheme='http://www.blogger.com/atom/ns#' term='Monitor'/><title type='text'>Bitrot</title><content type='html'>&lt;blockquote&gt;"&lt;b&gt;Bit rot&lt;/b&gt;, or &lt;b&gt;bit decay&lt;/b&gt;, is a colloquial computing term used either to describe gradual decay of storage media or to facetiously describe the spontaneous degradation of a software program over time. The latter use of the term implies that software can literally wear out or rust like a physical tool."&lt;/blockquote&gt;&lt;div style="text-align: justify;"&gt;I found an odd glitch when I got back into the code last weekend - everything was ticking-along nicely, until I entered the 'l' command into the monitor to dump the memory allocation table to screen. The software paused unexpectedly, and then Visual Studio kicked-in with a nasty 'Index out of range' exception and pointed me at the MemoryMap class. "What the hell...?", I muttered.&lt;br /&gt;&lt;br /&gt;This code hasn't changed for a while now, and has been entirely stable since the last chunk of work I did on it. Moreover, I've hardly touched the project for three weeks, and it was working fine the last time I did anything with it. And anyway, the changes I'd just made for the Register classes went nowhere near the MemoryMap class. Some sort of weirdness was occurring.&lt;br /&gt;&lt;br /&gt;Looking at the stack trace and other debug information, I could see that MemoryMap was being asked by the ShowMemoryAllocation routine to return a MemoryCell object for location 65536. This was obviously the cause of the 'Index out of range' exception, as the map is initialised as a 64K area with addresses in the range 0-65535, so anything outside of this (which should never occur!) is going to give the .NET runtime a headache. The big question was, therefore, how and why was location 65536 being requested?&lt;br /&gt;&lt;br /&gt;Digging into ShowMemoryAllocation, I could see that this code, too, had not changed in some time. With no obvious place to focus, I restarted the monitor with a breakpoint set, and re-tried the command. As I single-stepped through the code, I hit this little bit of logic:&lt;br /&gt;&lt;pre&gt;for (int i = 1; i &lt; memSize; i++)&lt;br /&gt;{&lt;br /&gt;  // Get the memory cells&lt;br /&gt;  MemoryMap.MemoryCell thisCell = _memory[(long)i];&lt;br /&gt;  MemoryMap.MemoryCell prevCell = _memory[(long)i + 1];&lt;br /&gt;  ...&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;What this is doing is asking MemoryMap for the current memory cell (as we loop through the whole array) and also the previous one. This lets us see if the type or assigned purpose is different on the current cell, and thus spit out a summary message for the previous cell type. Now, can you see the error?&lt;br /&gt;&lt;br /&gt;That's right - the reference to prevCell is being set as the MemoryCell at the current location &lt;span style="font-weight: bold;"&gt;PLUS&lt;/span&gt; one. So that would be the NEXT cell, not the PREVIOUS one - and when the current cell reference is location 65535, this asks for location 65536 and we blow the array upper bound. It's pretty obvious that prevCell should be asking for the cell at the current location &lt;span style="font-weight: bold;"&gt;MINUS&lt;/span&gt; one, and the loop this code is in starts at location one - that is, I designed this to handle the lower bound situation as well as the upper bound limit. So why the hell am I asking for the next location instead of the previous one?&lt;br /&gt;&lt;br /&gt;I thought it must have just been a typo on my part, perhaps a moment of insanity as I was falling ill the last time I was in the codebase. But looking back through the Subversion logs, I saw that this line of code had not changed since I wrote it - and the initial version DID correctly look at the previous location. There was no update that changed the sign of the operation to a positive!&lt;br /&gt;&lt;br /&gt;So, somehow, in the three weeks that the code lay dormant and I coughed and sneezed myself half to death, bitrot started to set in and that minus sign decayed to a plus. A single bit decays and flips the one next to it, and we have ASCII code 43 instead of 45; and ShowMemoryAllocation suddenly starts doing something dumb.&lt;br /&gt;&lt;br /&gt;It's working now, of course - with the sign changed back to a minus, the whole routine bursts into life and does what it's supposed to. C'est la vie...&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-7382265634306924219?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/7382265634306924219/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=7382265634306924219' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/7382265634306924219'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/7382265634306924219'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/12/bitrot.html' title='Bitrot'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-4392157842259477562</id><published>2008-12-01T09:22:00.003Z</published><updated>2008-12-01T10:02:07.998Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Refactoring'/><category scheme='http://www.blogger.com/atom/ns#' term='Registers'/><category scheme='http://www.blogger.com/atom/ns#' term='Byte'/><category scheme='http://www.blogger.com/atom/ns#' term='Indexer'/><title type='text'>Class Action</title><content type='html'>&lt;div style="text-align: justify;"&gt;Whew! &lt;span style="font-weight: bold;"&gt;HOW&lt;/span&gt; long since the last update?? Almost a month, that's how long. And after such an interval, you'll probably be assuming that all sorts of exciting things have happened to the emulator, right?&lt;br /&gt;&lt;br /&gt;Wrong. The real world has kind of got in the way of this project over the last three weeks - the first two of which I spent mostly in bed feeling pretty sorry for myself, suffering from a dose of genuine the-one-that-can-kill-you Influenza. Aside from a couple of visits to the doctor for industrial-strength medication, I didn't stir from my bed for most of that fortnight. And then last week, getting back into the routine of the day job left me largely energy-depleted and nowhere near sharp enough to do any work on the codebase.&lt;br /&gt;&lt;br /&gt;However, this weekend just past saw me feeling almost back to normal, so I opened-up Visual Studio on Saturday and had a look at where we were. I'm glad my commenting discipline is reasonably consistent, as it still took a good half-an-hour to re-sync my brain to the code and figure out where I'd got to, and what was next on the list of Stuff To Do.&lt;br /&gt;&lt;br /&gt;Having optimised the bit-access logic for the registers, I could see that the classes that implement them (the Accumulator, X and Y, PSR, SP and PC) were suffering from a mild dose of chaos and needed a tidy-up. So as a way of easing myself back into harness, I took the existing classes and refactored them so that they were better representations of the objects they instantiate, and got the inheritance hierarchy nicely arranged.&lt;br /&gt;&lt;br /&gt;So now, instead of separate specific classes for the 'special' register types (i.e. those with extended functionality above and beyond simple 8-bit register objects, like the PSR) we have a logical structure that implements a 16-bit register (Register16Bit, for PC), a derived 8-bit register (Register8Bit, that ignores the hi-byte from its' 16-bit base class, for A, X, Y and SP), and a further-derived 8-bit register (Register8BitStatus, derived from Register8Bit, for the PSR) that additionally always returns its contents with Bit 5 set to reflect the permanently-on state of that bit in the status register.&lt;br /&gt;&lt;br /&gt;Naturally all the content manipulation accessors, bit indexers, etc. are implemented in Register16Bit, so the derived registers get that functionality for free. Register8Bit is an empty shell that just passes content access up to Register16Bit, but masks-off the upper byte. And Register8BitStatus is an even emptier shell, that defers to Register8Bit for everything except during queries on its contents, when it overrides the underlying value by ORing it with $20 to switch bit 5 on.&lt;br /&gt;&lt;br /&gt;With everything neat and tidy again, I'm now back to the instruction implementations - and I'm currently grappling with the old foe, those twins of complexity that are the ADC and SBC instructions. I think I've got them working in normal mode now, and the next step is to confirm the Overflow flag behavior (which is horribly, horribly complicated) and then implement the Decimal Mode logic (which I've avoided thus far - it will either be a very simple task, or a nightmare to do efficiently).&lt;br /&gt;&lt;br /&gt;Stay tuned!&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-4392157842259477562?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/4392157842259477562/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=4392157842259477562' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/4392157842259477562'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/4392157842259477562'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/12/class-action.html' title='Class Action'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-1775942376063012498</id><published>2008-11-06T18:21:00.007Z</published><updated>2008-11-07T11:57:42.095Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Registers'/><category scheme='http://www.blogger.com/atom/ns#' term='Byte'/><category scheme='http://www.blogger.com/atom/ns#' term='Indexer'/><category scheme='http://www.blogger.com/atom/ns#' term='Bit'/><title type='text'>A Little Bit Better</title><content type='html'>&lt;div style="text-align: justify;"&gt;Continuing to add instructions to the core yesterday I noticed a pattern that had always been there in the code, but which had not previously jumped out at me. Almost every instruction makes changes to one or more bits (or 'flags') in the Program Status Register  - the PSR, or PSW as it is sometimes referred to, if described as a Word instead of a Register. I have a couple of unexciting methods in there to get or set the value of a bit in a byte, which look like this:&lt;br /&gt;&lt;/div&gt;&lt;pre&gt;// Return boolean bit setting (0=false, 1=true)&lt;br /&gt;public bool GetBit(int testValue, int bitMask)&lt;br /&gt;{&lt;br /&gt;  return (testValue &amp;amp; bitMask) == bitMask;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;// Set or clear the specified bit&lt;br /&gt;private int SetBit(int data, int bitMask, bool bitState)&lt;br /&gt;{&lt;br /&gt;  return bitState ? data | bitMask : data &amp;amp; ~bitMask;&lt;br /&gt;}&lt;/pre&gt;&lt;div style="text-align: justify;"&gt;And these are typically called in scenarios like these:&lt;pre&gt;case 152: // TYA Implied&lt;br /&gt;  ...&lt;br /&gt;  SR = SetBit(SR, _BIT1_ALSO_SR_ZERO, A == 0);&lt;br /&gt;  SR = SetBit(SR, _BIT7_ALSO_SR_NEGATIVE, GetBit(A, _BIT7_ALSO_SR_NEGATIVE));&lt;br /&gt;&lt;br /&gt;case 233: // SBC Immediate&lt;br /&gt;  if(GetBit(SR, _BIT3_ALSO_SR_DECIMAL))&lt;br /&gt;  {&lt;br /&gt;     ... some code that handles Decimal mode subtractions&lt;br /&gt;  }&lt;/pre&gt;Now they get called from lots of different places as each individual instruction wants to either test or set the value of a specific bit in the PSR. Individually, they don't amount to much of an overhead, but there are occasions when several bit-level operations need to happen in quick succession, or where a tight loop of instructions are repeated based upon the state of a PSR flag, by virtue of a BxC or BxS branch test. Every test or setting of one of these bits has to go through one or other (or sometimes both) of these methods.&lt;br /&gt;&lt;br /&gt;They're also a bit 'clunky' to use, as they need a bitmask passed in as a parameter - not the end of the world, but it does mean you have to think in terms of powers of two when setting or reading bits. Instead of saying 'set bit 7', for example, you have to say 'set the bit masked by value 2^7, or 128). Not especially intuitive - that's why I'm using constants for the bit references in the calls to the methods, as I've just got them all set-up as 1, 2, 4, ... 64, 128.&lt;br /&gt;&lt;br /&gt;It seemed like there was a correlation between hitting bits in a byte and twiddling values in an array. You can certainly consider a byte to be nothing more than an 8-element array of bits, and so it seemed plausible that there might be a neater way of addressing an individual bit in a byte to get or set its' value. Interestingly, the designers of the .NET libraries foresaw this requirement and developed the BitArray class - sadly hampered by one or two useability flaws which make it quite arduous to use (like being able to give the BitArray object a value at initialisation, but then not being able to easily change that value later).&lt;br /&gt;&lt;br /&gt;So I got to thinking about other ways I could address individual bits in a byte through an array-like interface. Obviously I could re-engineer the PSR class so that it really was an array internally, and then write methods to reference the elements of that array in isolation; I'd also need methods to get and set the value in aggregate form as well, of course, for when I wanted to address the content as a byte. But this didn't seem particularly elegant, and I just knew there'd be a performance penalty to pay too.&lt;br /&gt;&lt;br /&gt;But then my new friend the Indexer came to the rescue. The mechanics of using the Indexer mean it doesn't actually care (mostly) what type of object it's the interface to, so long as your code handles the translation of the index value into something that makes sense to the underlying to-be-indexed object. So how about writing some code that turns the index into a bit reference, and then applies essentially the same logic as my bespoke GetBit() and SetBit() methods do...?&lt;br /&gt;&lt;br /&gt;Here's what the Indexer code looks like:&lt;pre&gt;// Bit indexer&lt;br /&gt;public int this[int index]&lt;br /&gt;{&lt;br /&gt;  get&lt;br /&gt;  {&lt;br /&gt;    // Return zero or one depending on whether the indexed bit is clear or set in the register&lt;br /&gt;    return (_register &amp;amp; (1 &lt;&lt; index)) == 0 ? 0 : 1;&lt;br /&gt;  }&lt;br /&gt;  set&lt;br /&gt;  {&lt;br /&gt;    // Set the value of the indexed register bit according to the value provided (zero or one)&lt;br /&gt;    _register = value == 0 ? _register &amp;amp; ~(1 &lt;&lt; index) : _register | (1 &lt;&lt; index);&lt;br /&gt;  }&lt;br /&gt;}&lt;/pre&gt;And here's what the scenarios now look like to get and set bits in the PSR:&lt;pre&gt;case 152: // TYA Implied&lt;br /&gt;  ...&lt;br /&gt;  _SR[_BIT1_SR_ZERO] = _A.Contents == 0 ? 1 : 0;&lt;br /&gt;  _SR[_BIT7_SR_NEGATIVE] = _A[_BIT7_SR_NEGATIVE];&lt;br /&gt;&lt;br /&gt;case 233: // SBC Immediate&lt;br /&gt;  if(_SR[_BIT3_SR_DECIMAL] == 1)&lt;br /&gt;  {&lt;br /&gt;     ... some code that handles Decimal mode subtractions&lt;br /&gt;  }&lt;br /&gt;&lt;/pre&gt;As you can see, it's much more intuitive to use - instead of a call to a helper method with appropriate parameters, we're now talking directly to the register object. And we're also using actual bit values to refer to the bits in the byte rather than a bitmask value - I still use constants, but they're just in the range of 0, 1, 2, ... 6, 7 instead of powers of two.&lt;br /&gt;&lt;br /&gt;Because I implemented the Indexer on the GeneralRegister base class, I also get bit-indexing on all other registers too, not just the PSR. That means it's really easy to copy bit settings from one register to another, such as in the TYA example above where we set the PSR N-flag in bit 7 to whatever the Accumulator's sign in bit 7 is. Oh, and it's a little bit faster overall, too.&lt;br /&gt;&lt;br /&gt;Win-Win. :)&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-1775942376063012498?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/1775942376063012498/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=1775942376063012498' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/1775942376063012498'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/1775942376063012498'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/11/little-bit-better.html' title='A Little Bit Better'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-9024594496217635441</id><published>2008-11-04T12:17:00.003Z</published><updated>2008-11-04T12:30:33.044Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cycles'/><category scheme='http://www.blogger.com/atom/ns#' term='Metrics'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='Clock Speed'/><title type='text'>Spin Cycle</title><content type='html'>&lt;div style="text-align: justify;"&gt;Having analysed the deepest and darkest corners of the emulator code with a profiling tool, it's looking increasingly like the speed degradation I've seen recently is more due to the way I count instruction metrics rather than the actual implementation of the code. The biggest hotspot is, unsurprisingly, the ExecuteInstruction() method, and within that the bits that take the most time are fairly evenly scattered around in the form of calls to Properties in the various Register classes. I'll have a look at optimising that at  a later date, because what the analysis showed as a bigger issue was a degree of imprecision in the way core speed was being measured.&lt;br /&gt;&lt;br /&gt;How was metrics counting working?&lt;br /&gt;&lt;br /&gt;As each instruction executed, the basic cycle-count it specified was added to a running total of cycles the core had executed so far. This did not account for the instances where an instruction would need additional cycles, such as when a branch instruction is true, or at page boundaries for many instructions - those additions would be incrementally added by the instruction implementation logic itself, as it decided at run-time whether further cycles were needed.&lt;br /&gt;&lt;br /&gt;Every 100ms a timer fired which took the current total cycle count and stashed it in the next slot of a 10-element array (looping around to the beginning when full). It also then zeroed the total, so that it would begin to accrue again over the next 100ms. At the end of execution, these 10 most-recent stashed cycle totals were added together to give the total number of cycles executed over the previous second of elapsed time. This total was then divided by one million to give a megahertz value.&lt;br /&gt;&lt;br /&gt;One problem with this was latency - during the interval that the timer was stashing the total cycles for the last 100ms period, but before it reset the total, the core would continue to execute instructions, but of course any further additions to the total would be lost. There is also the natural imprecision of timer events at these frequencies - the timer is not guaranteed to fire on an exact 100ms boundary, and indeed observations showed it firing at varying intervals depending on overall operating system load. So there would be a small but measurable skew on the count added to the array, because it would be a variable amount over 100ms since the last one.&lt;br /&gt;&lt;br /&gt;Equally, the process worked over only the last second of elapsed execution time; so if the core had just completed a large amount of work involving instructions with small cycle times, the average would go up. In the reverse scenario, where the core had executed a bunch of instructions with large cycle times, the average would go down. So the reported clock speed could be highly variable depending on what the core had just been doing.&lt;br /&gt;&lt;br /&gt;How does metrics counting work now?&lt;br /&gt;&lt;br /&gt;Instead of counting the number of cycles executed, the core now simply increments a counter for each instruction every time that instruction is executed. These counters continue to accrue for the duration of the core execution period, and at the end of execution a simple calculation multiplies the number of each instruction execution by its' baseline cycle requirement. These results are summed to give a total number of cycles executed overall, and then divided by the number of seconds for which the core ran, giving a cycles-per-second result. Dividing this by one million then provides the megahertz clock speed.&lt;br /&gt;&lt;br /&gt;There are a couple of points to note about this approach. Firstly, we're not taking any account of additional cycle requirements for those instructions which vary according to circumstance (like branch or page boundaries). I'm not sure if I like that, but at the moment the main objective is to get a representative indicator of core speed, even if it's a little imprecise. Secondly, those instruction counters will eventually overflow in long-run situations, so I'm going to need some way of managing that.&lt;br /&gt;&lt;br /&gt;On the flip side, we've eliminated the slightly flakey timer-based measurement technique. Doing continuous counting also means we tend to smooth out those peculiarities where the core is busy doing lots of small- or large-cycle-count instruction executions, so when we come to do the speed calculation we get the average cycles-per-second over the entire run interval, and not a (potentially) biased view based on just the last second. This means our view of core speed is more representative of overall performance, and isn't skewed by whatever the core was just doing.&lt;br /&gt;&lt;br /&gt;What Next?&lt;br /&gt;&lt;br /&gt;I'm still not entirely satisfied with the way this works - under the covers, the core knows exactly how many cycles an instruction took to execute as it ran, so the fact that we don't see this knowledge exposed in the metrics (because we just multiply instruction counts by base cycles) irritates me. In any case, it feels like what I &lt;span style="font-style: italic;"&gt;should&lt;/span&gt; be doing is figuring-out how long a real 6502 takes to execute one cycle, and using that time period as a coefficient to figure out what percentage over 100% the emulator is executing cycles at.&lt;br /&gt;&lt;br /&gt;The other issue is that the 6502 does a limited form of pipelining - as each instruction is decoded and executed, activities relating to the next instruction can sometimes be happening in parallel. To properly emulate that, I need to take instruction decoding and execution up to the next level of complexity, where we're emulating T-states rather than instruction units.&lt;br /&gt;&lt;br /&gt;However, for now the clockspeed results I'm seeing are stable, and that means I can finish implementing and testing all the instructions I don't yet have in the core. I'll also do the undocumented ones as well, and have a toggle on the core instancing code to switch support for those on and off. In a few days I should have a fully-implemented instruction set, including known bugs (like the 'JMP Indirect' Page Boundary glitch), and then I can publish the code at that point as a working entity, and move on to the next stage...&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-9024594496217635441?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/9024594496217635441/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=9024594496217635441' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/9024594496217635441'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/9024594496217635441'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/11/spin-cycle.html' title='Spin Cycle'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-3620479693350518517</id><published>2008-10-30T10:24:00.004Z</published><updated>2008-10-30T10:43:56.341Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='Cycles'/><category scheme='http://www.blogger.com/atom/ns#' term='Indexer'/><category scheme='http://www.blogger.com/atom/ns#' term='Clock Speed'/><title type='text'>Slow News Day</title><content type='html'>&lt;div style="text-align: justify;"&gt;Hmm. Bit of a problem with the emulator, it would seem - after all the work I've put in to tidy-up the interfaces and object structures, improve the MemoryMap class with indexers instead of bespoke methods, get the CPUCore class nicely thread-safe, and just generally give the whole codebase a good spring-clean...&lt;br /&gt;&lt;br /&gt;...it's nearly 25% slower.&lt;br /&gt;&lt;br /&gt;This is, to use the official technical term, a bit of a bugger. I was expecting nothing less than an &lt;span style="font-style: italic;"&gt;improvement&lt;/span&gt; in core clock speed, and my worst-case scenario was that it remained constant. To come out of this with something that isn't as good as it was before is a bit disappointing, to say the least. Yes, the code is now much cleaner and significantly more maintainable, but for some reason I'm seeing clock speeds of around 30MHz instead of about 40MHz which I was getting before.&lt;br /&gt;&lt;br /&gt;Possible reasons for this are:&lt;br /&gt;&lt;br /&gt;1. My cycle-counting is broken somewhere, or the speed calculation is wonky. This would be the best option, as it would mean the core itself is OK and I'm just reporting the speed incorrectly.&lt;br /&gt;&lt;br /&gt;2. I've made enough changes that I've actually inadvertantly fixed a bug I didn't know I had in the old version that meant it &lt;span style="font-style: italic;"&gt;looked&lt;/span&gt; like it was running faster than it really was. Not as nice, because I don't know what that bug might have been, and therefore have no easy way to prove it was causing a problem and now isn't.&lt;br /&gt;&lt;br /&gt;3. I've introduced a bottleneck somewhere by accident. Maybe the shiny indexer code in MemoryMap isn't as slick as I thought it was; or maybe the use of the &lt;span style="font-weight: bold;"&gt;(long)&lt;/span&gt; cast to get the indexer to do a double-byte access is actually a bad idea, even though I don't do it that often. Or perhaps it's something else entirely, although to be honest the CPUCore instruction decode/execute logic is not particularly complex, so I'm not sure where else a slowdown might be occurring.&lt;br /&gt;&lt;br /&gt;I guess it's time to find a decent .NET code profiler and let it interrogate the emulator as it runs, to see where the hotspots are...&lt;br /&gt;&lt;br /&gt;Hmm.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-3620479693350518517?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/3620479693350518517/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=3620479693350518517' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/3620479693350518517'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/3620479693350518517'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/10/slow-news-day.html' title='Slow News Day'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-3884289456494295197</id><published>2008-10-26T19:05:00.008Z</published><updated>2008-10-27T16:47:27.790Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='Monitor'/><title type='text'>Feeling for a Pulse</title><content type='html'>&lt;div style="text-align: justify;"&gt;So, in the .NET 2.0 release, Microsoft deprecated the Suspend() and Resume() methods that existed to help simplify thread synchronisation. All for good reasons, I hasten to add - Suspend() was a highly-dodgy bit of code that took no notice of what the thread to be suspended was doing when it was halted, which meant that if it was in the middle of a non-atomic operation you could easily end up with half-complete updates, and locks held all over the place. Not nice.&lt;br /&gt;&lt;br /&gt;Anyway, my first-pass CPUCore class made use of these methods as a quick-and-dirty way of giving me control over thread execution, but I knew I'd have to do the decent thing at some point and replace them with the approved mechanisms provided by the Monitor object. So this weekend I sat down and read everything I could find that talked about how to use the Lock, Monitor.Wait and Monitor.Pulse methods to reproduce Suspend() and Resume() in thread-safe code. And although it was a steep learning-curve, the end result is a surprisingly neat bit of code that does what it's supposed to, and which doesn't spit compiler warnings at me.&lt;br /&gt;&lt;br /&gt;This is pretty-much the main emulation core loop. I've ripped-out a load of incidental stuff, leaving just the skeleton it all sits on - so you can see that we just spin through the loop forever (until something clears the IsRunning flag) either executing instructions or, if the core is paused, sitting on Monitor.Wait until we get a pulse to tell us to start again:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;// Loop forever, until the flag is cleared to indicate thread disposal&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;while(_isRunning)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  switch(_runMode)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    case CoreRunMode.RunPaused:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      lock(this)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        Monitor.Wait(this);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      break;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    case CoreRunMode.RunContinuous:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    case CoreRunMode.RunStepped:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      ExecuteInstruction();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      break;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And outside of the core, this bit of code executes when we want the core to resume after a pause:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;// Resume core execution&lt;br /&gt;public void CoreResume()&lt;br /&gt;{&lt;br /&gt; lock(_core)&lt;br /&gt; {&lt;br /&gt;   _core.RunMode = CPUCore.CoreRunMode.RunContinuous;&lt;br /&gt;   Monitor.Pulse(_core);&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Again, all this work is essentially invisible outside of the emulation object, but is necessary both from an aesthetic point of view and for making maintenance easier in the future. As a final snippet for this post, compare how the STA (Indirect Indexed, Y) instruction has evolved as I've iteratively refined the code:&lt;br /&gt;&lt;br /&gt;Initial:&lt;pre&gt;&lt;span style="font-family: courier new;"&gt;case 145: // STA Indirect Indexed,Y&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  address = (ushort)((data.memory.Peek(ops[0] + 1) * 256) + data.memory.Peek(ops[0]));&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  data.memory.Poke(address + data.Y.Contents, data.A.Contents);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  break;&lt;/span&gt;&lt;/pre&gt;Interim:&lt;br /&gt;&lt;pre&gt;&lt;span style="font-family: courier new;"&gt;case 145: // STA Indirect Indexed,Y&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  data.memory.Poke(data.memory.Deek(data.memory.Peek(data.PC.Contents + 1)) + data.Y.Contents, data.A.Contents);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  break;&lt;/span&gt;&lt;/pre&gt;And now:&lt;br /&gt;&lt;pre&gt;&lt;span style="font-family: courier new;"&gt;case 145: // STA Indirect Indexed,Y&lt;/span&gt;&lt;br /&gt;  _mem[_mem[(long)_mem[PC + 1]] + Y] = A;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  break;&lt;/span&gt;&lt;/pre&gt; It's a bit cryptic, where before it was more readily understandable, but then we've traded legibility for speed - the use of the indexer on the memory array is considerably quicker than going through discrete Peek() and Poke() methods. Also, the memory structure itself is now a 'top-level' entity as far as the CPU Core object is concerned, instead of being nested inside the larger aggregated data structure that holds a lot of other CPU-state information.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-3884289456494295197?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/3884289456494295197/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=3884289456494295197' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/3884289456494295197'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/3884289456494295197'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/10/feeling-for-pulse.html' title='Feeling for a Pulse'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-5043930717656990233</id><published>2008-10-22T11:30:00.003+01:00</published><updated>2008-10-22T13:22:52.714+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Exceptions'/><category scheme='http://www.blogger.com/atom/ns#' term='Indexer'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='Risk'/><category scheme='http://www.blogger.com/atom/ns#' term='Clock Speed'/><title type='text'>The Exception to the Rule</title><content type='html'>&lt;div style="text-align: justify;"&gt;I've been refactoring some of the deepest layers of the emulator over the last few days, prompted by an increasing dissatisfaction with the encapsulation of various data structures that the CPU logic needs in order to be able to do anything meaningful. As an example, let's consider the memory emulation object which presents a memory map to the CPU for it to do stuff with.&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;MemoryMap is a pretty simple class - an array of MemoryCell objects (which describe the type of memory - RAM/ROM/etc - and some other metadata, together with the actual content of the cell) and a variety of methods to get and set these properties. There are also a couple of routines to do things like dump the array into a readable form for debugging, and provide a mechanism for loading data into chunks of cells during initialisation.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;The problem I had was that most of this was exposed publicly, because a lot of my work-in-progress testing needed me to be able to get into the object from way up the application in the topmost Main() test area. It all worked, but it was bad practice, and I needed to sort it out. So I tore MemoryMap to pieces and put it back together in a nice encapsulated way, with everything either Private or Internal apart from the accessor methods, and then marked the class as Sealed. And whilst I was doing that, I took the opportunity to rework some of the functionality so that things like formatted output and bulk load initialisation is now done outside the class via calls into it. I also implemented an Indexer on the object, which makes the most common form of access (i.e. to read and write cell contents) a whole lot nicer.&lt;br /&gt;&lt;br /&gt;Here's how it used to work: accesses to memory cell contents were done through bespoke methods (called, for geeky amusement, PEEK, POKE, DEEK and DOKE) which took as parameters the location and value (for POKE/DOKE) to set the contents, or just a location (for PEEK/DEEK) to return the contents. PEEK and POKE handled 8-bit byte values, and DEEK and DOKE accepted (or returned) 16-bit byte pairs and then called PEEK/POKE to do the two byte accesses as needed. To access a memory location, the CPU core logic would issue one of these:&lt;br /&gt;&lt;pre&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;_memory.Poke(location, value8);&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;_memory.Doke(location, value16);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;int value8 = _memory.Peek(location);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;int value16 = _memory.Deek(location);&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;All well and good, but the new Indexer makes this a thing of the past. An indexer basically makes something in the object it belongs to referrable via an index. So accesses to the memory cell contents now look like this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;_memory[location] = value8;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;_memory[(long)location] = value16;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;int value8 = _memory[location];&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;int value16 = _memory[(long)location];&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;I actually have two indexers, one which accepts an INT index, and one which accepts a LONG. Refer to the memory object with an INT and you do a single-byte access; use a LONG instead and you do a double-byte access. The bespoke methods are gone, and naturally this all works a good deal faster. How fast? Well, in order to get a meaningful metric, I had to run a hundred million iterations of a read/write combination of accesses - yes, 100,000,000. Single-byte accesses took 1.25 seconds, and double-byte accesses took 3.12 seconds. Here's what the code looks like:&lt;/div&gt;&lt;br /&gt;&lt;a href="http://picasaweb.google.com/lh/photo/NbHbjWpTyR1Ka-dpQxCZVQ"&gt;&lt;img src="http://lh4.ggpht.com/jonners9/SP8Z8ox2cQI/AAAAAAAAARY/vSrfJfQ2ZCU/s400/Indexers%2022102008.png" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;Now whilst I was doing this refactoring work, I was testing performance as I went along to see whether the new code was slower, faster, or the same as the old. It became apparent pretty quickly that it was faster, which was nice, but I noticed something else as well - the validation I was doing on the index value incurred a notable overhead. The principal objective of the validation was to prevent accesses to the underlying MemoryCell array with out-of-bounds values - i.e. given a standard 64K memory map, we would not want location (index) values lower than zero or higher than 65,535. So we need to either filter them out and return some sort of error code, or just have them 'wrap around' so that location 65,536 is interpreted as zero, 65,537 as one, and so on.&lt;br /&gt;&lt;br /&gt;There are two ways of doing this - we either push all incoming location values through a filter method, to reject or wrap out-of-bounds values, or we allow the access to happen with whatever value we're given but trap any resultant IndexOutOfRangeException that the CLR spits out. Both have merits and drawbacks - you can argue for either approach with an equally strong case - but they share a common issue in that they slow down access to the array. In some performance tests I did, these validation techniques dropped array access speed by up to 100% depending on the variation I was timing.&lt;br /&gt;&lt;br /&gt;Which brings us to the elementary question that so often troubles  software engineers these days: how much code should I write to create an iron-clad, bullet-proof interface that won't allow nonsense values through to the delicate structures underneath, whilst at the same time maintaining a high level of performance? There are, of course, several schools of thought, and it often depends on the nature of the application (and  the specific area of that application) in question:&lt;br /&gt;&lt;br /&gt;1. Wrap it in steel. Validation, exception-handling, fail-safe defaults, you name it. Take advantage of whatever you can make use of in your language of choice to make it impossible (or at least very difficult) for a user of your code to accidentally (or deliberately) break something and either pervert the execution of the code, 0r crash it altogether. Performance is a secondary concern - if the code is fragile, performance will be the least of your worries.&lt;br /&gt;&lt;br /&gt;2. Wrap the surface layers in steel, because that's where strangers will be interfacing with your code. Stop them from doing naughty things through that interface, and try to extend the security and safety features all the way down to the deepest levels of the code. Any time something breaks, manage the error in such a way that the software can either recover from it, or exit gracefully and tell someone what went wrong.&lt;br /&gt;&lt;br /&gt;3. As point two above, but in specific controlled conditions where you have 99% confidence of the state of the system, and where performance is critical, relax the regulations a bit. So if you have a method deep inside your code that is nowhere near the exposed interface, which takes two values as parameters from another method that has already validated them, and has to do something to those values as fast as possible, then it might be OK to skip further validation or exception-handling because the chances of an error are very slim.&lt;br /&gt;&lt;br /&gt;It's a tricky one. In my day job, option one applies unquestionably. Money rests on my software doing it's job, and although performance is nice, it's better to be right than fast in most cases. And anyway, 'fast' is a relative term, and most business software can be classed as fast if it delivers the result in a second or two.&lt;br /&gt;&lt;br /&gt;But way down in my MemoryMap class, where a millisecond is an eternity, things are different. Here we have to be as fast as we can, and the design of the entire software structure should be able to rely on that speed. The quid-pro-quo is that the speed is dependant on known states, and that means no passing values that are out of bounds. It's a kind of contract - MemoryMap says it'll process memory accesses as fast as it possibly can, in exchange for nice index values. Play nasty, and MemoryMap will have the rest of the system crashing down before you can say 'BRK'.&lt;br /&gt;&lt;br /&gt;The upshot is that in order to get maximum speed out of the class, I've had to make an exception (haha, see what I did there?) to my rule, and forget exception handling and validation. Fortunately, I know that index values for memory accesses will only be coming via the CPU core, and therefore will be through 8-bit or 16-bit register operations, and are consequently guaranteed to be 'in range'. Equally, this class is not public and cannot be inherited, so there's no danger of anything other than the CPU core talking to it.&lt;br /&gt;&lt;br /&gt;It's a risk, but a calculated one.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-5043930717656990233?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/5043930717656990233/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=5043930717656990233' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/5043930717656990233'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/5043930717656990233'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/10/exception-to-rule.html' title='The Exception to the Rule'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/jonners9/SP8Z8ox2cQI/AAAAAAAAARY/vSrfJfQ2ZCU/s72-c/Indexers%2022102008.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-7304020014819574988</id><published>2008-10-21T09:48:00.003+01:00</published><updated>2008-10-21T10:06:22.166+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Deek'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='VIC-20'/><category scheme='http://www.blogger.com/atom/ns#' term='Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='Clock Speed'/><title type='text'>Object-ion, Your Honour!</title><content type='html'>&lt;div style="text-align: justify;"&gt;This'll be just a brief update, more to reassure everyone that work is still ongoing and the quiet periods between posts here are the places where I'm writing code. So, here's what I've been doing in the last week:&lt;br /&gt;&lt;br /&gt;1. Reworked the timing code to get more accurate clockspeed stats out of the core.&lt;br /&gt;&lt;br /&gt;2. Put some more functionality in the MCM wrapper to make debugging easier.&lt;br /&gt;&lt;br /&gt;3. Altered the MemoryMap class to use an Indexer instead of Peek/Poke/Deek/Doke methods.&lt;br /&gt;&lt;br /&gt;At that point, I noticed the interfaces to the MemoryMap and CPUCore objects were getting a bit messy, so I've been cleaning things up and getting everything back on a proper object-oriented footing, where stuff that should really be hidden behind an encapsulation interface is now tucked-away out of sight. This is mostly invisible implementation-structure stuff, but the payoff is that there's less complexity to the outer facade that the core exposes.&lt;br /&gt;&lt;br /&gt;Another couple of days doing this sort of 'scaffold rearrangement' should see the end of it, and then I can return to the exciting part, which is getting actual 6502 instructions implemented. At the moment I have just enough to allow the VIC-20 ROM to execute, but there's a lot more to do before the instruction set is complete - and that's without the 'undocumented' opcodes, which will also be in there when it's all done.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-7304020014819574988?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/7304020014819574988/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=7304020014819574988' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/7304020014819574988'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/7304020014819574988'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/10/object-ion-your-honour.html' title='Object-ion, Your Honour!'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-9006245045198862710</id><published>2008-10-14T12:37:00.006+01:00</published><updated>2008-10-14T13:10:59.054+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='Disassembly'/><category scheme='http://www.blogger.com/atom/ns#' term='Labels'/><category scheme='http://www.blogger.com/atom/ns#' term='Branch'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='Monitor'/><title type='text'>Uhura Is Busy; I Am Monitoring</title><content type='html'>&lt;div style="text-align: justify;"&gt;Bonus points for correctly identifying where/who that quote comes from. ;)&lt;br /&gt;&lt;br /&gt;To the point - in order to start properly debugging the emulator core, as opposed to just having random 'Console.Writeline' entries in the body of the code, I've created a wrapper around the core object to present an Old Skool command-line interface through which I can drive and direct the execution; in old parlance, a Machine Code Monitor.&lt;br /&gt;&lt;br /&gt;It instantiates a MemoryMap object (pre-loaded with the VIC-20 ROMs) and a CPU object, and then waits for me to tell it what to do. Essentially, it's little more than a command parser that makes calls to diagnostic functionality I've already implemented; (m)emory dump, (d)isassemble, (r)egister state, etc. There's also some additional functionality to start and stop CPU execution, go into single-step execution mode, set breakpoints, and a few other bits and pieces. This is enough of a toolset for me to trace exactly what the core is doing as it executes 6502 instructions, and make sure it's behaving.&lt;br /&gt;&lt;br /&gt;Screenshot:&lt;br /&gt;&lt;a href="http://picasaweb.google.com/lh/photo/hUhRS1HxN8rUnM9u4tgGsQ"&gt;&lt;img src="http://lh6.ggpht.com/jonners9/SPSJJTh0vxI/AAAAAAAAAQk/kZI11nbklOQ/s400/Monitoring%2014102008.png" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I now have fairly fine-grain control over how the core executes, and a reasonable window into it so that I can see what it's thinking as it burns through instructions. Which is nice.&lt;br /&gt;&lt;br /&gt;The sharp-eyed amongst you may also spot a tweak I made to the disassembly - it now displays the actual jump address when decoding Branch instructions, which is much nicer to read. There's another little readability tweak I'm planning to add to the disassembly display in the next couple of days - integration of labels. This will take a simple text file of entry point and/or jump labels (such as a ROM disassembly index) and bind that to the disassembly so you'll get label names in the appropriate places as well as addresses.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-9006245045198862710?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/9006245045198862710/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=9006245045198862710' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/9006245045198862710'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/9006245045198862710'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/10/uhura-is-busy-i-am-monitoring.html' title='Uhura Is Busy; I Am Monitoring'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/jonners9/SPSJJTh0vxI/AAAAAAAAAQk/kZI11nbklOQ/s72-c/Monitoring%2014102008.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-6241129217284573049</id><published>2008-10-10T21:18:00.004+01:00</published><updated>2008-10-10T22:43:34.849+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='Cycles'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='Clock Speed'/><title type='text'>Marking Time</title><content type='html'>&lt;div style="text-align: justify;"&gt;After I wrote that earlier post, I spent the rest of the day in an Arthur Dent sort of mode - things kept wandering around in my head looking for other things to connect with. Mainly to do with the timing metrics I was talking about, and the rather nice improvement of the emulated clock speed from 2.4MHz to 7MHz. As is so often the case, having made such a gain, I wasn't satisfied. I kept thinking that although 7MHz was pretty good, it still on reflection seemed a little sluggish for what is a fairly simple linear C# program running on a dual-core multi-gigahertz PC. Maybe my timing mechanism was a bit inaccurate...?&lt;br /&gt;&lt;br /&gt;I thought about what it was I was measuring; a simple counter, incrementing once each pass through the main loop, named coreCycles. Seems pretty straightforward - every 100ms I stash the counter and reset it, and then at the end of the run I average-out the last 10 stashed values, divide it by a cool million, and thus we have cycles per second. Doing that, I'm now coming out at 6.977MHz, or just shy of 7MHz.&lt;br /&gt;&lt;br /&gt;But hold on - I also mentioned previously that the future intent is to account for the actual number of cycles that each instruction really needs in which to execute. Which means that right now, what I'm effectively counting is just one cycle per instruction - the fetch cycle in which we pull the opcode from memory, for example. So rather than counting cycles per second, what I'm actually counting is &lt;span style="font-style: italic;"&gt;instructions&lt;/span&gt; per second! Looking at it from that angle, 7 million instructions per second isn't bad.&lt;br /&gt;&lt;br /&gt;And when I plugged-in the actual clock cycles per instruction, things looked even more interesting. Now I'm accounting for the right number of cycles per instruction (minus the oddities like branches needing one more) the counter is incrementing properly. And after a few executions of the emulator to get an average, our actual, proper, accurate tally of cycles per second gives us an emulated clock speed of...&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-size:180%;"&gt;&lt;span style="font-weight: bold;"&gt;40MHz&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&lt;/div&gt;...I'm happy with that. It'll do. ;)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-6241129217284573049?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/6241129217284573049/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=6241129217284573049' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/6241129217284573049'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/6241129217284573049'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/10/marking-time.html' title='Marking Time'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-1806651341446863930</id><published>2008-10-10T10:57:00.007+01:00</published><updated>2008-10-10T11:28:07.690+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Byte'/><category scheme='http://www.blogger.com/atom/ns#' term='Deek'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='Int'/><title type='text'>Reaping The Reward</title><content type='html'>&lt;div style="text-align: justify;"&gt;Well, this morning I finished giving the emulator enough instruction implementations that it can burn through sufficient VIC-20 ROM until the IRQ stuff takes over. In other words, it's getting to the point in the Kernal that it sits in a spinloop waiting for an IRQ to fire - although, of course, I need to go back to the test harness now to make that happen.&lt;br /&gt;&lt;br /&gt;What it lets me do, however, is make proper use of the metrics counting code that's been steadfastly reporting 0MHz whilst I've been in debug mode. The counter ticks up every time an instruction is decoded (this will shortly be extended to count the actual number of cycles the instruction takes) and every 100ms a timer fires, stashes the count in a 10-entry LIFO array, and resets it. At the end of the run, the ten entries are averaged and divided by a million to give me the number of cycles per second the core is executing.&lt;br /&gt;&lt;br /&gt;At the end of last week, I was getting about 2.4MHz out of the core as it was at the time. This was pretty good, but I knew it was too slow for what I want to do - the current instruction decode logic is just a clunky series of CASEs, but this isn't the final version. Something else will happen here, but it'll need an estimated 4MHz emulation speed in the current guise to give me enough overhead to run the core at a true 1MHz. Anyway, I ran the core at full speed today, with all the debugging code disabled, and having done all those Byte/Int/cast/DEEK adjustment tasks, it now runs at:&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;!! 7MHz !!&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;I believe the appropriate term is 'woot', or somesuch. Anyway, this proves the pain of reworking the code this week was well worth it. And for those with an interest, here's a tiny weeny snippet of code showing what a typical instruction looked like both before and after that rework...&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Before:&lt;pre&gt;&lt;span style="font-family:courier new;"&gt;case 145: // STA Indirect Indexed,Y&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  address = (ushort)((data.memory.Peek(ops[0] + 1) * 256)&lt;br /&gt;   + data.memory.Peek(ops[0]));&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  data.memory.Poke(address + data.Y.Contents,&lt;br /&gt;   data.A.Contents);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  break;&lt;/span&gt;&lt;/pre&gt;After:&lt;br /&gt;&lt;pre&gt;&lt;span style="font-family:courier new;"&gt;case 145: // STA Indirect Indexed,Y&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  data.memory.Poke(data.memory.Deek(&lt;br /&gt;   data.memory.Peek(data.PC.Contents + 1))&lt;br /&gt;   + data.Y.Contents, data.A.Contents);&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  break;&lt;/span&gt;&lt;/pre&gt;&lt;div style="text-align: justify;"&gt;I will be publishing the entire sourcecode when it's done, but at the moment it's full of the usual scaffolding any ongoing project has - bits of debugging logic, miscellaneous one-shot variables, odd naming conventions, etc. I'll also let FXCop have at it at some point as well, just to catch anything I miss.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-1806651341446863930?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/1806651341446863930/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=1806651341446863930' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/1806651341446863930'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/1806651341446863930'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/10/reaping-reward.html' title='Reaping The Reward'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-8765418354579138764</id><published>2008-10-09T23:05:00.005+01:00</published><updated>2008-10-09T23:58:24.726+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Byte'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='VIC-20'/><category scheme='http://www.blogger.com/atom/ns#' term='Int'/><title type='text'>Back on Track</title><content type='html'>&lt;div style="text-align: justify;"&gt;Not a vast amount of noteworthy progress to report today, but then I suppose some days are just 'chug-along' periods where you grind through some necessary work without either making any startling discoveries or hitting any major flaws. Today was one of those days, in which I finished changing all the Byte references to Int, and then going through all the core code to eliminate casts and make sure the compiler was happy with all my type-related activity.&lt;br /&gt;&lt;br /&gt;By the end of the coding session, I'd got back to the point where the VIC-20 ROM was being executed all the way through until it hit the ADC instruction I was working on before. I'd also added a new method to the MemoryMap class, to give me back a two-byte value as a 16-bit address in one call. This really helps with instructions that use addresses (or indirect addresses) as operands, because now the core issues just one call to MemoryMap to get the value instead of issuing two and then combining them. Back in the early 80's I was enough of a nerd that I'd avidly read the lists of BASIC keywords each microcomputer offered to see what they could do, and was always impressed with any version that offered double-byte PEEK and POKE commands (often named DEEK and DOKE). So, as a little homage, MemoryMaps' new double-byte address method is named DEEK.&lt;br /&gt;&lt;br /&gt;One little oddity - I said previously that I was implementing instructions as the ROM presented them, so that each time I executed the emulator it would get a little bit further before finding another unimplemented instruction for me to work on. Logically, therefore, as I went through all the existing instructions I'd already covered, making changes for the Byte/Int switchover and DEEK, I should end up with all the instructions using the new code. So why do I have one instruction left over, still commented-out? It exists because the ROM must have presented it as needing implementing in the core, but hasn't been re-presented since I commented all the instructions out and reworked them.&lt;br /&gt;&lt;br /&gt;That sort of implies that the core is following a slightly different path through the ROM now, which means either I had a bug before, or I have one now. Given that the decode logic is basically new code, using Ints and double-byte routines where previously it was all Bytes and combinations, the bug could easily be in the new version, or could have been in the old code which is now history. Tomorrow I'm going to do a scan over the ROM and see if this anomalous opcode exists in the code path or not - if it does, I'm missing it now and that's bad. If it doesn't, it means the old code incorrectly decoded an operand byte as an opcode and now we don't do that, which is good.&lt;br /&gt;&lt;br /&gt;I'll also come back to ADC tomorrow, which was giving me a real headache with the old code mechanisms, and in particular the logic to set the Overflow (V) Flag is a complicated little problem that I wasn't happy with my solution to. So we'll start again, and see how it works out.  I'll leave you with a list of the instructions/modes that the emulator supports as of now, just so we can measure progress later when I start slacking. Stay tuned!&lt;br /&gt;&lt;ul style="font-family: courier new;"&gt;&lt;li&gt;ADC Immediate&lt;/li&gt;&lt;li&gt;AND Immediate&lt;/li&gt;&lt;li&gt;ASL Accumulator&lt;/li&gt;&lt;li&gt;BCC Relative&lt;/li&gt;&lt;li&gt;BCS Relative&lt;/li&gt;&lt;li&gt;BEQ Relative&lt;/li&gt;&lt;li&gt;  BNE Relative&lt;/li&gt;&lt;li&gt;BPL Relative&lt;/li&gt;&lt;li&gt;CLC Implied&lt;/li&gt;&lt;li&gt;CLD Implied&lt;/li&gt;&lt;li&gt;  CMP Absolute,X&lt;/li&gt;&lt;li&gt;  CMP Immediate&lt;/li&gt;&lt;li&gt;  CMP Indirect Indexed,Y&lt;/li&gt;&lt;li&gt; CPX Immediate&lt;/li&gt;&lt;li&gt; CPY Immediate&lt;/li&gt;&lt;li&gt;  DEX Implied&lt;/li&gt;&lt;li&gt;DEY Implied&lt;/li&gt;&lt;li&gt;INC Absolute,X&lt;/li&gt;&lt;li&gt;   INX Implied&lt;/li&gt;&lt;li&gt;   INC Zero Page&lt;/li&gt;&lt;li&gt;JMP Absolute&lt;/li&gt;&lt;li&gt;JMP Indirect&lt;/li&gt;&lt;li&gt;JSR Absolute&lt;/li&gt;&lt;li&gt;LDA Absolute&lt;/li&gt;&lt;li&gt;  LDA Absolute,X&lt;/li&gt;&lt;li&gt;   LDA Absolute,Y&lt;/li&gt;&lt;li&gt;  LDA Immediate&lt;/li&gt;&lt;li&gt;  LDA Indirect Indexed,Y&lt;/li&gt;&lt;li&gt;  LDA Zero Page&lt;/li&gt;&lt;li&gt;  LDX Immediate&lt;/li&gt;&lt;li&gt; LDX Zero Page&lt;/li&gt;&lt;li&gt;  LDY Immediate&lt;/li&gt;&lt;li&gt;  LDY Zero Page&lt;/li&gt;&lt;li&gt;ORA Absolute&lt;/li&gt;&lt;li&gt;ORA Immediate&lt;/li&gt;&lt;li&gt;ROR Accumulator&lt;/li&gt;&lt;li&gt;RTS Implied&lt;/li&gt;&lt;li&gt;SEI Implied&lt;/li&gt;&lt;li&gt;STA Absolute&lt;/li&gt;&lt;li&gt;STA Absolute,X&lt;/li&gt;&lt;li&gt;STA Absolute,Y&lt;/li&gt;&lt;li&gt;STA Indirect Indexed,Y&lt;/li&gt;&lt;li&gt;STA Zero Page&lt;/li&gt;&lt;li&gt;STA Zero Page,X&lt;/li&gt;&lt;li&gt;STX Absolute&lt;/li&gt;&lt;li&gt;STX Zero Page&lt;/li&gt;&lt;li&gt;STY Absolute&lt;/li&gt;&lt;li&gt;STY Zero Page&lt;/li&gt;&lt;li&gt;STY Zero Page,X&lt;/li&gt;&lt;li&gt;TAX Implied&lt;/li&gt;&lt;li&gt;TAY Implied&lt;/li&gt;&lt;li&gt;TXA Implied&lt;/li&gt;&lt;li&gt;TXS Implied&lt;/li&gt;&lt;li&gt;TYA Implied&lt;/li&gt;&lt;/ul&gt;                                   &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-8765418354579138764?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/8765418354579138764/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=8765418354579138764' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/8765418354579138764'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/8765418354579138764'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/10/back-on-track.html' title='Back on Track'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-5942550399329220015</id><published>2008-10-08T10:56:00.003+01:00</published><updated>2008-10-08T11:30:40.114+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Registers'/><category scheme='http://www.blogger.com/atom/ns#' term='Byte'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='Int'/><title type='text'>Byte-Sized Chunks</title><content type='html'>&lt;div style="text-align: justify;"&gt;I mentioned previously that the original code I wrote a couple of weeks ago to emulate the 6502 registers, and the memory emulation, were all Byte-based. It made sense at the time - the 6502 is an 8-bit CPU, and everything it does is in 8-bit units of work (except for the Program Counter register, which is 16-bits but is really just a pair of 8-bit registers joined together). Making the emulation use Byte entities to represent all these things seemed logical, particularly as that meant I could let the .NET CLR take care of overflow/underflow scenarios - i.e. if the value in a register (for example) went above 255, it would automatically 'wrap around' because a byte can't hold a value greater than that.&lt;br /&gt;&lt;br /&gt;However, as the emulator started to take shape and actually do stuff, I noticed an increasing number of situations where I was doing things that required more than 8 bits, and then having to cast back to the underlying Byte of the register (or memory location). A good example of this is where I've got to right now in terms of active progress - the ADC instruction. ADC (Add with Carry) is actually quite a complicated little instruction to emulate, and I was doing various things with Int-sized objects before cramming the result back into the Accumulator registers' Byte. That's just one example though - there were several other places where I was doing explicit or implicit casts between Bytes, Ushorts and Ints.&lt;br /&gt;&lt;br /&gt;Is this a bad thing? Inherently, no - the language is designed to enable this kind of transition from certain types to others, otherwise the cast functionality wouldn't be there. But there's a performance penalty to pay for every cast that occurs, and although it's a tiny cost that normally makes no difference, it all adds up. And when you're emulating a CPU that has to be able to execute a million cycles per second, it can add up to a major performance hit. To get an idea of how fast this code has to run, think about the System.Timers.Timer object - it has a minimum  interval resolution of 1ms (one millisecond) which means if you set a timer running at that rate, it'll trigger the 'tick' event a thousand times a second. Which is pretty fast; but then consider that to emulate the 6502 at it's standard 1MHz clock speed, you have to be simulating a thousand CPU cycles &lt;span style="font-style: italic;"&gt;on every tick of that 1ms timer&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;The standard C# Int is 32 bits wide, which is massively overkill for an 8-bit CPU emulation. But equally, the Intel and AMD silicon we're running the language on these days has a default work-unit width of 32 bits as well (unless you've got a 64-bit processor, of course) and Windows is geared to 32-bits too (unless you're running a 64-bit version on your 64-bit processor, naturally). In other words, 32 bits is the 'comfortable' work-unit that the C# CLR, .NET, Windows, and the hardware underneath it all like to use. A series of performance tests I did with variables of a variety of types from Byte (8 bits) up to Int (32 bits) showed not a vast difference from slowest to fastest, but nevertheless a measurable one - Ints come out fractionally faster (in milliseconds) when huge numbers of tiny operations (like increments and decrements) are happening in a tight loop.&lt;br /&gt;&lt;br /&gt;So right now I've suspended work on instruction implementation to go back and change all my Bytes to Ints. This gives me a marginal performance improvement in itself, but also means I never have to cast down to Byte during instruction execution - another performance gain. The downside is that I have to do the value overflow and underflow handling myself, but this is actually quite a quick operation (we just AND the value with 255 when changing it) and is still faster overall than using Bytes.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-5942550399329220015?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/5942550399329220015/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=5942550399329220015' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/5942550399329220015'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/5942550399329220015'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/10/byte-sized-chunks.html' title='Byte-Sized Chunks'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-1621133459670228042</id><published>2008-10-07T09:46:00.000+01:00</published><updated>2008-10-07T11:33:30.401+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='Memory'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='Assembler'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='VIC-20'/><title type='text'>Six of One, Half-a-Dozen of the Other</title><content type='html'>&lt;div style="text-align: justify;"&gt;Over the weekend, I got the emulator running nicely - I set it so that it would 'power-on' and read the hard-wired RESET address at $FFFC, which on a VIC-20 vectors to $FD22 - the startup entry point for the Kernal OS ROM. To begin with, no instructions were implemented, so I configured the code to execute until it hit an instruction I hadn't catered-for yet, and then break with a short disassembly of the area so I could see which instruction needed something written for it. This was a really good feedback/incentive mechanism, because the more instructions I got working, the further through the ROM the emulator ran before stopping again.&lt;br /&gt;&lt;br /&gt;This worked wonderfully, right up until the ROM went into an infinite loop:&lt;br /&gt;&lt;/div&gt;&lt;span style="font-family:courier new;"&gt;&lt;pre&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-size:100%;"&gt;$FDEB  20 C3 E5  JSR $E5C3    ; VIC register init. subroutine&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style=";font-family:courier new;font-size:85%;"  &gt;$FDEE  4C EB FD  JMP $FDEB    ; Jump back to previous JSR&lt;/span&gt;&lt;/pre&gt;&lt;div  style="text-align: justify;font-family:georgia;"&gt;&lt;span style="font-family:georgia;"&gt;The call to the routine at $E5C3 happens at the end of the bit of code that checks to see how much memory the VIC-20 has installed, and whether it's all working properly. At the point of this infinite loop, the subroutine is just a short sequence to initialise some registers on the VIC chip (the Video Interface Chip for which the VIC-20 is named) and then it returns, after which that JMP put us right back at the subroutine call again. This was obviously Not Good, and indicated that I either had a dodgy ROM image, or a bug in my decode logic that was giving the JMP the wrong address - after all, the real VIC-20 doesn't go into this infinite loop when starting, otherwise it'd be a fairly disappointing experience for anyone trying to use it!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;The first thing I did was fire-up VICE (a very good emulator of a variety of Commodore machines) in VIC-20 mode and had a PEEK at the ROM locations my code was getting stuck at. To my surprise, they matched byte-for-byte - so my ROM image was not broken in some way, and the JMP address was being correctly decoded. A real VIC-20 &lt;/span&gt;&lt;span style="font-weight: bold;font-family:georgia;" &gt;-does- &lt;/span&gt;&lt;span style="font-family:georgia;"&gt;have this infinite loop in its ROM. So how the hell does it get out of it and finish booting-up?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;I wondered if some hardware event took place that would reset the Program Counter to something else - maybe the ROM authors deliberately wanted the boot process to pause here until some other part of the computer was properly awake and fired an interrupt. But at this point in the ROM (which I'd been tracing very carefully as I debugged my emulator) I knew we hadn't enabled interrupts yet, and no timers had been set running either. Maybe the NMI (Non-Maskable Interrupt) was being fired by something - that's a hardware interrupt that overrides whatever else the CPU is doing, but which my emulator hasn't got implemented yet...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Well, I pondered and wondered for a bit, and re-read a lot of documentation about the VIC-20 boot process (including an annotated ROM disassembly) and didn't find anything helpful. So I posted the question to Denial and YakYak in the hope that someone on one of those forums would have a deeper understanding of what was going on. And 'Mike' on Denial did indeed have a cunning insight into the problem:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Mike: "&lt;/span&gt;&lt;span class="postbody"  style="font-family:georgia;"&gt;The VIC ends up in this endless loop if it detects that RAM 'ends' before $2000. Normally this would indicate a defect in the built-in RAM. &lt;/span&gt;&lt;span class="postbody"&gt;&lt;span style="font-family:georgia;"&gt;But in your case, there must be a bug in your RAM emulation."&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;What? A bug? In my memory emulation code?? Impossible!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Well, I went back and checked, of course. There's not actually that much code in the class that handles emulated memory, and it was all working fine. So I dug deeper into the ROM code that does the memory check, and confirmed that the branch into the endless loop is triggered if a certain zero-page location ($C2) contains a value of less than #$20 by the time the test ends, which indicates that somewhere below $2000 the memory failed a test and the counter wasn't incremented. So I re-ran the emulator with memory diagnostics enabled, so I could see which memory locations were being accessed and what their contents were, and sure enough we were falling into the infinite loop because $C2 contained #$1F at the end of the test.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Now we're &lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;" &gt;emulating &lt;/span&gt;&lt;span style="font-family:georgia;"&gt;memory here. There's no real RAM that could fail a test - it's just a C# array of objects, representing the memory location and its type. The type is there so I can tell whether the memory location has been defined as RAM, ROM, Memory-Mapped I/O, or if it hasn't been defined as anything, because there isn't any actual memory present there. In each case, as accesses to the location occur, the type is checked to make sure the access is permitted - so we allow reads and writes to RAM and I/O, reads but NOT writes to ROM, and a read which always returns zero to Undefined (and no writes, of course).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Scanning back through the emulator output, I could see that the memory test ROM code was doing a series of non-destructive writes and reads through the RAM to make sure it was working, but right at the end of the test, as it was checking the screen memory space, the reads were returning zero instead of the test value, and so the test failed and $C2 didn't get incremented. Why would the memory emulation work for all other areas of RAM that were being tested, but not that last little section. Just six bytes were failing. WTF?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Well, after a minute or two, the light dawned. I knew, from experience, that the VIC-20 screen RAM is 506 bytes long, from $1E00 to $1FF9. Whatever documentation you care to look at confirms this, and my memory emulation was precisely assigning that space correctly - as we can see on the sixth allocation row from the diagnostics:&lt;/span&gt;&lt;br /&gt;&lt;pre  style="font-family:courier new;"&gt;&lt;span style="font-size:85%;"&gt;; Memory Block Allocation&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 0000 - 00FF   RAM     (00256 / 0x0100 bytes)  BASIC Working Storage&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 0100 - 01FF   RAM     (00256 / 0x0100 bytes)  6502 Processor Stack&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 0200 - 03FF   RAM     (00512 / 0x0200 bytes)  BASIC Working Storage&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 0400 - 0FFF   ---     (03072 / 0x0C00 bytes)  -- Undefined --&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 1000 - 1DFF   RAM     (03584 / 0x0E00 bytes)  User Memory&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 1E00 - 1FF9   RAM     (00506 / 0x01FA bytes)  Screen Memory&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 1FFA - 7FFF   ---     (24582 / 0x6006 bytes)  -- Undefined --&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 8000 - 8FFF   ROM     (04096 / 0x1000 bytes)  Character Set Generator&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 9000 - 900F   I/O     (00016 / 0x0010 bytes)  6561 VIC Registers&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 9010 - 910F   ---     (00256 / 0x0100 bytes)  -- Undefined --&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 9110 - 912F   I/O     (00032 / 0x0020 bytes)  6522 VIA Registers&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 9130 - 95FF   ---     (01232 / 0x04D0 bytes)  -- Undefined --&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 9600 - 97FF   RAM     (00512 / 0x0200 bytes)  Colour Memory&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; 9800 - BFFF   ---     (10240 / 0x2800 bytes)  -- Undefined --&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; C000 - DFFF   ROM     (08192 / 0x2000 bytes)  BASIC Interpreter&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;; E000 - FFFF   ROM     (08192 / 0x2000 bytes)  KERNAL Operating System&lt;/span&gt;&lt;/pre&gt;&lt;span style="font-family:georgia;"&gt;No problem there. But wait - that means there are a further six Undefined bytes between $1FFA and $2000. Which will always return zero when they're read...&lt;br /&gt;&lt;br /&gt;Yep - the ROM memory test was failing because of those six 'lost' bytes between the end of screen RAM and the expected end of memory. In the real VIC-20, those bytes are never referred to - they exist as RAM, but have essentially slipped down the back of the sofa because the screen doesn't use them. However, the ROM memory test knows they're there, and tests them, and thus got very upset when my precision-engineered memory emulation reported them as Undefined. A two-second change to the initialiser, and the screen now gets 512 bytes of RAM to take its endpoint to $1FFF, which means $C2 gets incremented as expected, and the ROM doesn't then branch to an infinite loop.&lt;br /&gt;&lt;br /&gt;Phew!&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-1621133459670228042?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/1621133459670228042/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=1621133459670228042' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/1621133459670228042'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/1621133459670228042'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/10/six-of-one-half-dozen-of-other.html' title='Six of One, Half-a-Dozen of the Other'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-6400335144631569418</id><published>2008-10-06T23:07:00.002+01:00</published><updated>2008-10-07T13:57:59.589+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='Memory'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='VIC-20'/><title type='text'>Remember Me</title><content type='html'>&lt;div style="text-align: justify;"&gt;The thing about emulating a CPU is, it needs something to work with - after all, what is a CPU but a fast little box for reading data and then spitting it out somewhere having changed it in some way? Every byte of data passing through a CPU is either a nugget of raw data, coming in, an instruction to do something to it, or a nugget of processed data on its way out. Disregarding those instructions that merely change the state of the CPU in some way, of course - though even they still have to come from &lt;span style="font-style: italic;"&gt;somewhere&lt;/span&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;So to get the emulation to do anything, we have to connect it to a structure of some sort, containing some instructions and some data to work with. That means, in the case of a 6502, a memory object to represent the 64K that it can talk to - the simplest approach (in a high level language like C#) being a 65536-element byte array, although I also want some extra functionality that will let me provide a small amount of memory protection, and also make it easier to see what memory is allocated. To do this, I decided to make each element in the array a structure that combines both the memory 'cell' itself, plus a couple of metadata items that tell me something about what that cell is doing - so, for example, if the CPU attempts to execute an STA to a memory cell that's marked as ROM, we can just ignore it. 65000-odd of those strung together, and there's our memory space - though I've since switched from Byte storage units to Integers.&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;With a fairly simple array of my MemoryCell structure, plus a few routines to manage the initialisation of bits of it as either RAM or ROM (where either of these can be loaded from binary files) or as Undefined - essentially areas of 'dead' or unmapped memory, plus a special definition of Mapped I/O (of which more later), we have a mechanism to emulate any configuration of a 64K memory space. Initially we'll be setting it all up to look like an unexpanded (5K) VIC-20, but there's nothing at all to stop us changing that to reflect the map of any other 6502-based machine, anywhere, ever. Want to embed our CPU emulation in a C64? Or an Atari VCS? Apple II? BBC Micro? Atari 800? KIM-1? PET? You get the idea. Obviously the specific hardware needs emulating in each case as well, but the memory model will fit them all.&lt;br /&gt;&lt;br /&gt;Note though, that this rather neat flexibility can also be a right pain if you make even the tiniest mistake when configuring it. I'll recount that little distraction next time...&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-6400335144631569418?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/6400335144631569418/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=6400335144631569418' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/6400335144631569418'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/6400335144631569418'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/10/remember-me.html' title='Remember Me'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-4639143986663106626</id><published>2008-09-11T10:00:00.001+01:00</published><updated>2008-10-07T13:56:51.698+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Registers'/><category scheme='http://www.blogger.com/atom/ns#' term='ML'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='VIC-20'/><title type='text'>And So It Begins</title><content type='html'>&lt;div style="text-align: justify;"&gt;Where do you start when writing an emulator for a CPU? I guess the obvious place is 'The School of Hard Knocks', because in order to emulate the thing, you've got to have used it in a real practical way for long enough that you know it pretty much inside-out, including all its foibles and quirks. I qualify for that in spades with the 6502, as it's a chip I've been intimately familiar with since about 1981, and I've also read a whole lot of stuff about the inner workings of the silicon as well.&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;But on a practical level, it makes sense to start with the elementary stuff, and it seemed to me that the registers of the chip would make a good place to begin. There are five 8-bit registers (the Accumulator, X-Register, Y-Register, Status Register and Stack Pointer) and one 16-bit register (the Program Counter - really just two 8-bit registers jammed together). The Accumulator (.A) is a general-purpose register that  also sees a lot of action as a component in numerical operations; .X and .Y are also general-purpose, but have special roles to play in a number of the addressing modes the chip supports for indexing; .SR is a special register that holds the PSW (Program Status Word, sometimes also known as the PSR or Program Status Register) where various processor-state bits live; and .SP is another special register that points to the top of the Processor Stack. The Program Counter (.PC) is the 16-bit address of the next memory location that the CPU will read or write to - and being comprised of two 8-bit units (the PC Lo-byte and PC Hi-byte) we can see that the 6502 can address 256*256 bytes of memory, or 65536 bytes, or 64K.&lt;br /&gt;&lt;br /&gt;So we need a few classes to represent these objects: a base class of 8-bit registers for .A, .X, .Y and .SP; a derived class from that with additional functionality to handle Processor State bits individually for .SR; and another base class incorporating two 8-bit register objects as a 16-bit pair for .PC. Now, my original implementation of these objects exposed their contents as Byte values, because after all a byte is 8 bits, and thus neatly replicates the hardware. For .PC, I used a Ushort type, which is a 16-bit entity and again handily represents the actual hardware. But as the emulator progressed to the point where it was actually starting to process instructions, I found I was utilising quite a lot of explicit casts (and a fair number of implicit casts too) to go from Int to Byte or Ushort (an explicit cast is where the programmer needs to help the compiler to accurately convert one type to another) and this slows things down. More on this later, but for now I'll just say that after a half-day spent testing various performance scenarios, I've decided to make all the registers Int and do the 'capping' manually (i.e. instead of letting the Byte and Ushort types automatically handle rollover from FF to 00 - or FFFF to 0000 - I'm doing it in code).&lt;br /&gt;&lt;br /&gt;We join the project at a critical juncture - the basic CPU framework is complete, and undergoing a little rework to convert those registers to Int. I have a memory emulation framework in place and working as well, and this lets me hook the CPU to a facsimile of a real machine in which the processor would really exist - I've decided to use the Commodore VIC-20 as the testbed, so the memory configuration is mapped the way it was in that microcomputer, and the appropriate ROMs are loaded into the right regions. More on this later - but don't let the choice of computer worry you as I'm only using the VIC-20 because of my familiarity with it, so I can see and prove that the 6502 emulation is working correctly - when we're done, that CPU code will be just as much at home in any other simulation.&lt;br /&gt;&lt;br /&gt;Here are a couple of screenshots from the console output of the emulator as it stands today - click them to see larger versions. In the first, we're seeing the initialisation stage of the test harness and the CPU itself - the memory emulation has been populated to simulate an unexpanded VIC-20 (shown in the Memory  Block Allocation section) and we have a little 'dump' of raw memory from the start of the Kernal followed by a simple disassembly of the first few commands of the operating system ROM. I use the term 'Simple Disassembly' on purpose, because this is really just a decoded memory dump - there's no labelling, no automatic matching to memory-map descriptor tables, no branch address calculation... all that will come later.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://picasaweb.google.com/lh/photo/jEOd4vnGEvOlU64knEavJg?authkey=VU60xwYAJNM"&gt;&lt;img src="http://lh6.ggpht.com/jonners9/SOo1DE5jxWI/AAAAAAAAAP8/QcUtJYwqP8s/s800/Startup%2006102008.png" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In the second, we see a relatively uninteresting stream of executed instructions. The CPU is reading and decoding the Kernal, and then executing the instruction just as the real hardware chip would do. This is for debugging purposes - you wouldn't normally see this; it shows me the current value of .PC and the instruction (and addressing mode) being executed, alongside which we see the rest of the register values both before and after the instruction was performed.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://picasaweb.google.com/lh/photo/8mKKGuY2Ir2ZnPxSpJLKDQ?authkey=VU60xwYAJNM"&gt;&lt;img src="http://lh3.ggpht.com/jonners9/SOo1DYirHhI/AAAAAAAAAQE/p6_7gODwmJE/s800/Executing%2006102008.png" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;As can be readily observed, I'm in the middle of working on the ADC instruction, as the emulator is telling me it has been asked to execute an ADC but there's no implementation for it yet! But actually, I've been distracted by a knotty little problem to do with my test harness - ironically, the CPU emulation is working fine so far (even though incomplete) but I appear to have run into a peculiarity of the VIC-20 ROM initialisation code. More to follow.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-4639143986663106626?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/4639143986663106626/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=4639143986663106626' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/4639143986663106626'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/4639143986663106626'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/09/and-so-it-begins.html' title='And So It Begins'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/jonners9/SOo1DE5jxWI/AAAAAAAAAP8/QcUtJYwqP8s/s72-c/Startup%2006102008.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5115611155063132884.post-1171727242118620359</id><published>2008-09-08T21:35:00.000+01:00</published><updated>2008-10-06T16:54:54.818+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Emulator'/><category scheme='http://www.blogger.com/atom/ns#' term='Stella'/><category scheme='http://www.blogger.com/atom/ns#' term='ML'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='Assembler'/><category scheme='http://www.blogger.com/atom/ns#' term='6502'/><category scheme='http://www.blogger.com/atom/ns#' term='VIC-20'/><category scheme='http://www.blogger.com/atom/ns#' term='DASM'/><title type='text'>Wanna Take A Ride?</title><content type='html'>&lt;div style="text-align: justify;"&gt;Back in the early '80s I was a VIC-20 owner, and got pretty comfortable writing 6502 machine code (also known as Machine Language, or ML). The tools of the day were very primitive by todays' standards, and I ended-up writing my own assembler/disassembler tools to make my life easier. Those tools were better-suited to my requirements than anything else I could find for the VIC at the time (he said modestly) but even so they fell far short of  the features offered by similar products available today, such as DASM. Eventually I moved on to other, more powerful machines, and left my 6502 days behind.&lt;br /&gt;&lt;br /&gt;Fast-forward to 2001, and I started to re-kindle my lapsed interest in the 6502 when I downloaded and played with Stella, the Atari VCS emulator. After a few months, I realised that there was a thriving community of coders out there actually writing VCS games and other stuff, using DASM or whatever their tool of choice was, and running their code in Stella. That was too tempting to pass up, and in fairly short order I was immersed in the delights of synchronised time-critical video generator code, and having things bouncing around in an emulated VCS on my PC.&lt;br /&gt;&lt;br /&gt;Now this particular exercise made something quite apparent - although DASM (which I quickly became a fan of) was massively powerful and extremely pleasant to assemble 6502 code with, it was and is still left to the coder to count instruction cycles when writing time-critical code, and that is error-prone. Just a single clock-cycle out, and your entire VCS game screen turns to rolling garbage because the timing of the instructions dosen't quite fit a scan-line exactly, which means you lose vertical synchronisation, and all you see is junk.&lt;br /&gt;&lt;br /&gt;I realised that what I needed was an 'assembling editor' that would automatically tally instruction cycles for me as I wrote the code, and thus make subtle errors in timing quite apparent. Since the 6502 is very well documented, all the instruction timings are readily available, as well as the variations induced by circumstantial conditions - like the fact that a branch takes longer to execute when taken than when not, or that there's sometimes a timing cost associated with crossing a memory page boundary. So having a routine do all the counting, and also accomodating those variations whilst doing it, seemed pretty simple to my mind.&lt;br /&gt;&lt;br /&gt;I made a couple of prototypes, but was never quite satisfied - it always seemed to me that there was something vaguely inelegant about writing a basic lookup mechanism and then trying to find rules that would reliably incorporate the myriad variations in timing that can occur as the 6502 executes the instructions. Eventually it dawned on me that the best way to make all this happen would be if the 6502 itself was executing the instructions as you typed them, and reporting back on the actual time it took - naturally including the variations because it would 'really' be running the code in-situ. And that meant that before I could get the editor working, I needed a good 6502 emulation.&lt;br /&gt;&lt;br /&gt;And &lt;span style="font-style: italic;"&gt;that&lt;/span&gt; meant that I needed to write an emulator that would do all the things that any other 6502 emulator does (i.e. execute instructions and reflect register states, etc) but also include things like an open port into which I could feed a single instruction and an environment state and get back a timing value. Sounds simple if you say it fast. ;)&lt;br /&gt;&lt;br /&gt;Anyway, this series of blog posts will document from the very beginning my creation of such a 6502 emulator. I'll be hosting the C# sourcecode in Google Code once it's reasonably stable (my code, not Google Code ;)) so if you want, you can tag along and build the thing yourself at the end - and naturally if you see a way to do something better, I hope you'll use the comments facility here to tell me and everyone else about it. No doubt it'll be an entertaining ride as I make progress, break something, reverse, and move forward again; but hopefully I'll end up with something that works, and that I can use in the way I want.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5115611155063132884-1171727242118620359?l=legacysystem.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://legacysystem.blogspot.com/feeds/1171727242118620359/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5115611155063132884&amp;postID=1171727242118620359' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/1171727242118620359'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5115611155063132884/posts/default/1171727242118620359'/><link rel='alternate' type='text/html' href='http://legacysystem.blogspot.com/2008/09/wanna-take-ride.html' title='Wanna Take A Ride?'/><author><name>Jonners</name><uri>http://www.blogger.com/profile/10783097207231622242</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='29' src='http://3.bp.blogspot.com/_k7S257jPIsM/S4BCVi6S3sI/AAAAAAAAAqc/4_pLkqr1Wxc/s1600-R/CommodorePET4032.jpg'/></author><thr:total>0</thr:total></entry></feed>
