Emulating the Sega Genesis - Part III
Also available on dev.to
Written December 2021/January 2022 by transistor_fet
A few months ago, I wrote a 68000 emulator in Rust named Moa. My original goal was to emulate a simple computer I had previously built. After only a few weeks, I had that software up and running in the emulator, and my attention turned to what other platforms with 68000s I could try emulating. My thoughts quickly turned to the Sega Genesis and without thinking about it too much, I dove right in. What started as an unserious half-thought of "wouldn't that be cool" turned into a few months of fighting documentation, game programming hacks, and my sanity with some side quests along the way, all in the name of finding and squashing bugs in the 68k emulator I had already written.
This is Part III in the series. If you haven't already read Part I and Part II, you might want to do so. Part I covers setting up the emulator, getting some game ROMs to run, and implementing the DMA and memory features of the VDP. Part II covers adding a graphical frontend to Moa, and then implementing a first attempt at generating video output. Part III will be about debugging the various problems in the VDP and CPU implementations to get a working emulator capable of playing games. For more details on the 68000 and the basic design of Moa, check out Making a 68000 Emulator in Rust.
- Previously
- Fixing The Colours
- Drawing A Blank
- What About Those Interrupts
- And Now For Something (A Little) Different
- Back to the Genesis
- VRAM Discrepancies
- You Can't Write There, Sir
- Fixing Sprites
- Not All The Data
- Scrolling The Scrolls
- Fixing Line Scrolling
- Rewriting
- Conclusion
Previously
After about two weeks of work on adding Sega Genesis support to my emulator, I had implemented memory operations for the video display processor (VDP), and written a draw loop to generate the video frames according to the SEGA documentation. The result of all that work was this:
This is Sonic 1 attempting to show the SEGA logo at startup. It's better than Sonic 2 which was just a black screen, a few log messages, and then... nothing... The few other games I tried were no better.
When I had started this project, I thought it probably wouldn't be too hard to get something as simple as the SEGA logo working, but I was wrong. After spending a day or two fiddling with quick fixes that didn't fix much of anything, I committed my work in progress to git, so that I could track and undo any changes I made, and started in to some serious debugging. The following is my journey of debugging, on and off over the next six weeks, until I managed to get Sonic 2 running well enough to play.
Fixing The Colours
The most obvious thing that was wrong was the colours, so I looked into this first. Since I
couldn't be sure that all the data was getting into the VDP correctly, I needed to simplify the
output a bit, so I wrote an alternate draw_frame()
function to display just the patterns instead
of the scroll tables. It would draw each pattern in memory across the screen from left to right, top
to bottom so that I could inspect them better. They might not look like a coherent picture, being
only 8x8 pixels each and arranged in an unintended order, but it should at least show something.
The result was this:
There is definitely some kind of pattern data being displayed because the patterns are not a solid colour, but the colours are clearly wrong. I'm expecting some blue colours since it should be printing the SEGA logo.
For about a day I was doubting and testing the transfer of data into CRAM. I had found a minor bug earlier in that code. After staring at the values in CRAM for a while, I noticed that the colour values were actually correct. There were values of 0xEEE and 0xE00 and a few others, so it had to be a problem with reading the CRAM to get the u32 colours value. The code to convert CRAM values into colours was:
let rgb = read_beu16(&self.cram[((palette * 16) + colour) as usize..]);
(((rgb & 0xF00) as u32) >> 4) | (((rgb & 0x0F0) as u32) << 8) | (((rgb & 0x00F) as u32) << 20)
There had definitely been some problems with those complex shift operations, but the tricker problem turned out to be the index into the CRAM that was wrong. Since the CRAM is an array of u8, which was chosen in order to reuse the same transfer and DMA code with VRAM, I needed to multiply the index by 2 before reading the word at that location. Now the colours actually make sense:
Switching back to displaying the scrolls I'm now getting a white screen, but not much else. sigh
In Sonic 1, parts of the SEGA logo were displayed if I only drew Scroll A or Scroll B, but
displaying both together didn't work. I needed to add the mask colour, which is always colour 0 in
each palette. I modified the .blit()
method to not draw anything if the colour 0 is used (later
changed to 0xFFFFFFFF to avoid a conflict with the colour black, represented by 0), and now I was
getting something.
Now it's actually showing the SEGA logo! The scrolls are finally working, even if they still don't look right and the animation is painfully slow.
Drawing A Blank
While Sonic 1 seemed to at least try to display something, Sonic 2 and a few other games wouldn't display anything at all. With the various debug messages turned on, the logs showed it was initializing various devices and then would get caught in a loop where it would read the status word of the VDP over and over again. Clearly it was looking for a specific bit value in the status word before it would move on, but I didn't know which one.
The status word is
returned when reading (instead of writing) from the VDP's control port. It contains a number of
status flag bits and is one of the few ways the CPU can get feedback from the VDP, with interrupts
being the other. In my existing code, the FIFO and NTSC bits were set statically, and the DMA bit
was being set and reset during DMA operations, so it probably wasn't related to those. Given that
this problem happens right away, it's probably not looking at the sprite flags either. I reckon
it's something to do with the HBLANK
/VBLANK
bits, or possibly the V Interrupt Happened
bit.
The HBLANK
and VBLANK
bits are set when the video output signal is in its blanking phases. On a
CRT, it takes time after a line has been drawn for the electron beam to move back to the start of
the next line, and be ready to output the next line of data. It also takes time (a lot more time)
after the entire screen has been drawn for the beam to move back to the top of the screen again to
start the next refresh. Since the video signal's data is directly output to the CRT as soon as it's
received (the joys of analogue signals), the video signal itself needs to incorporate these blanking
delays where no data is sent. These blanking periods just so happen to be convenient times for the
CPU to update or change data in the VDP, when those changes wont affect the output. This is
especially important during the vertical blanking period, when the positions of everything on the
screen can be updated at once before the next frame is drawn to prevent artifacts and glitches in
the image.
I was moving fast to get something working, so I quickly implemented the vertical blanking bit by setting it just before getting to the end of the frame, at 14_218_000 ns, and then resetting the bit at 16_630_000 ns when the frame is drawn and the vertical interrupt is triggered.
This worked for the time being, but it turned out to cause another error that slowed the animation
down by half, which I didn't notice until after I had the scrolling working. It wasn't until I
could actually play the games that I noticed the problem, and by that point I had forgotten about
this bit. It took me a day or two of debugging before I finally tracked down the problem to the
VBLANK
bit.
After the vertical interrupt occurs, some games would busy wait until the vertical blanking bit was set before actually running the game loop. Sonic 2 is one such game, but Sonic 1 doesn't do this check. Since the bit is only set about 2ms before the next vertical interrupt, the game's frame updater would only start 2ms before the next interrupt, and would still be updating the frame at that point, so it would ignore the second vertical interrupt. As a result, it would take two frames of time (2 vertical interrupts) before one frame of the game would be drawn, and only one cycle of the game loop would execute. Sonic was moving at exactly half speed. Doubling the amount of simulated time fixed the issue (which didn't make any sense at first). I even went to the trouble of implementing more accurate instruction timing in the 68000 in order to see if it was caused by the fact that all the instructions had previously been running in 4 clock cycles. Shown below is the more recent code with the fixed blanking behaviour.
The following code is in the updated VDP's .step()
function, including the VBLANK
bit handling.
The HBLANK
code looks similar but with different timing values.
self.state.v_clock += diff;
if (self.state.status & STATUS_IN_VBLANK) != 0 && self.state.v_clock >= 1_205_992 && self.state.v_clock <= 15_424_008 {
self.state.status &= !STATUS_IN_VBLANK;
}
if (self.state.status & STATUS_IN_VBLANK) == 0 && self.state.v_clock >= 15_424_008 {
self.state.status |= STATUS_IN_VBLANK;
... // Vertical Interrupt and Frame Update Code
}
if self.state.v_clock > 16_630_000 {
self.state.v_clock -= 16_630_000;
}
Finally! The SEGA logo in Sonic 2 is (almost) displaying correctly. There are a few glitches in the logo but that's because I hadn't implemented the reverse patterns yet. Adding support for that fixed the logo right up.
What About Those Interrupts
While Sonic 2 was now advancing enough to show the scrolls, it was very slow, the same as Sonic 1, from the start of the program through displaying the logo and then finally getting to the title screen. It was taking half a minute or more.
My first suspicion was to check the interrupts, since it's usually the vertical interrupt that triggers the progression of time in these games. It's a reliable signal to use for knowing how long to show the logo screen for, or when to read the controller input, calculate movement, and then update the screen. Turning on the debugging output for the interrupts showed that they weren't occurring anywhere near as fast as they should be. It would take seconds before an interrupt occurred, and they would occur randomly rather than at a regular pace.
I'm not all that surprised given that I knew there were issues with the implementation, and I had run into problems with them when working on Computie support, but I hadn't been sure how I wanted to fix them. Now I needed to fix them.
In the original implementation, there was a trait for Interruptable
devices with a function that
would be called by the interrupt controller when an interrupt occurred, which would trigger the
interrupt handler. That works in theory, but an interrupt might not be handled right away if
interrupts are disabled, and the callback might not be re-called when interrupts were re-enabled.
There was also no mechanism for acknowledging an interrupt, and the 68k implementation's handling of
the interrupt priority mask was buggy. The result was that interrupts would only occur when
everything happened to line up, which wasn't very often.
For the 68000, an interrupt can occur with a priority between 1 and 7. A higher number is a higher
priority, and interrupts below a certain priority number can be disabled using a priority mask value
stored in the %sr
register. When an interrupt occurs, the CPU will check that priority number
against the priority mask. If the requested interrupt number is strictly higher than the mask, then
the %sr
and %pc
registers will be pushed onto the stack, the priority mask will be changed to
the current number (to prevent a duplicate handling of the same interrupt), and the handler will be
run. If the interrupt priority equals or is lower than the mask, the CPU will keep running whatever
it had been running before, at least until the priority mask changes, or a higher priority unmasked
interrupt occurs.
For devices like the serial controller in Computie, the interrupt signal will be asserted and stay asserted until the cause of the interrupt is manually acknowledged by writing a certain value to the serial controller. For the Genesis, on the other hand, the interrupts behave more like one-shots where there is no manual acknowledgement, and the signal should be de-asserted as soon as it's acknowledged, essentially.
As for the CPU, if an interrupt is masked when the signal was assert, and then unmasked while the signal is still asserted, it will run the handler (ie. the interrupt signals are level triggered, not edge triggered). If the signal goes away before the interrupt is unmasked, the handler will never be run.
In hardware, interrupts will only be checked at a certain point in the CPU's cycle, usually between
the execution of instructions, so it's actually pretty reasonable for the emulated CPU to manually
check for interrupts at the end of an instruction cycle. All it has to do is check the interrupt
controller object in System
. The Interruptable
trait wasn't needed anymore. Devices call the
interrupt controller to set an interrupt, and the CPU calls the interrupt controller to check if any
are active. It's not a terribly complicated problem, but it's easy to get wrong in subtle ways,
such that it might work for some devices but not others.
Now it runs at what seems like the right speed! Lets ignore, for a minute, the other glaring issues...
And Now For Something (A Little) Different
At this point, I had the colours and interrupts sorted out, the scrolls were being displayed somewhat correctly, and the sprites were sort of working, but multi-cell sprites were still broken. Everything I had tried to fix the sprites didn't work, and I had no idea if it was because of the VDP implementation or a bug in the CPU. And to make matters worse, Sonic was falling through the floor during the gameplay.
And here I got stuck. I had been doing nothing but debugging for a week at this point, three weeks after starting on the Genesis and about five weeks since I had started the emulator. I had made good progress but this last week was a grind. There were multiple issues, both in the VDP and the CPU. I had already fixed the few things that really stood out, but I was running out of threads to pull on, and getting frustrated. I needed to try something else.
I had not yet proven out the 68000 implementation, so some of the problems I was encountering could be in there and not in the VDP code. There was no easy way to tell where the problem was without tracing a lot of assembly code to figure out what it was supposed to do, looking for a one bit change somewhere in the CPU registers or in memory. I needed a way to test the 68000 better, and why not try implementing another system?
The Macintosh 512k also used the 68000 and it's a fairly simple computer, in terms of I/O. It had a very basic video display circuit made from generic logic that looped through memory addresses and shifted the bits into the video output stream. The display only supported black and white so each pixel was a single bit that was either on or off. The ROMs that were embedded on the motherboard are available at archive.org, so I started making some devices and running the ROMs to see if I could find some bugs in the 68000 emulation alone.
At the same time, I looked into implementing the Z80 that the Genesis would also need. Some games seemed to get stuck waiting for the non-existent Z80 to respond, so I thought I might as well start a Z80 implementation too. It would be something different to work on when I was stuck on everything else. At least then I'd make some progress, which would encourage me to keep going.
In order to develop a Z80 implementation, I needed some Z80 code to run on it, and any I/O devices that the code needed. I could write my own Z80 code of course, but that wouldn't test the implementation well enough, beyond just basic functioning of the instructions. I needed code for an existing platform, with all its expectations of how the real system behaves embedded in its logic, and that meant implementing devices for an existing platform. I looked around for the simplest Z80 platform I could find, which turned out to be the TRS-80. I'm not the biggest fan of the TRS-80, but I did have a "Model I" in my computer collection at one point (that I sadly had to sell), so it wasn't entirely foreign to me. I could get away with just implementing the video display and the keyboard in order to run the BASIC interpreter that comes in its ROM.
Over the next month, I mostly worked on these sub-projects, as well as on another Computie hardware iteration. The TRS-80 implementation came together fairly smoothly apart from a bug in the Z80 implementation's shift operation that took me a day or two of tracing the Level I BASIC ROM's assembly code to fix. (Thanks to George Phillips for the well documented assembly code).
The Macintosh implementation didn't go as smoothly however. I did manage to find and fix a few bugs in the 68000, and I got far enough to display the Dead Mac screen, but I got stuck just before the end of the ROM's initialization where it opens the default device drivers. At some point, it attempts to write to a location in the ROM. In hardware that shouldn't have an affect, except that I have some code in Moa to raise an error when that happens, since it's likely a bug. Ignoring that error didn't make it get any farther. I couldn't for the life of me find out what was wrong, but at one point, using another emulator, I was able to confirm that if the ROMs ran on a system that didn't mirror the RAM and ROM address exactly as the hardware does, the ROM wont boot. facepalm Effort went into making sure the Macintosh was not cloned like the IBM PC, so I was fighting against those effort as well. After a while I decided to give the Genesis a try again.
Back to the Genesis
After getting stuck on the Macintosh implementation, I picked up the Genesis again. I had spent almost an entire month away. In that time, I had worked on another hardware revision for Computie, and wrote the article "Making a 68000 Emulator In Rust". I also had improved the Moa debugger, implemented the Z80 entirely, filled in a number of missing 68k instructions, and finished implementing all the 68k instruction decoding (although a few instructions are still not implemented because they aren't use by any code I've tried to run). I also fixed some bugs in existing instructions, such as MOVEM which copies data to or from multiple registers at a time. Perhaps some things could be fixed?
On the surface though, the results were the same as last time. The scrolls were mostly working, but
the sprites were broken, and Sonic was still falling through the floor to his death. I had added
the Z80 coprocessor into the system, now that it was implemented (I might as well), but I had left
the Z80 address space as one big 64 KB MemoryBlock
. The Z80 alone didn't changed anything in
Sonic 2, or in Sonic 1, which was still getting stuck at the title screen as it a had before.
I needed a way to isolate the drawing of sprites so I could better figure out what was wrong, and it was only at this point it occurred to me to search for demo and test ROMs that might help. That immediately turned up ComradeOj's demos, particularly Tiny Demo, which scrolls some text across the screen, and GenTest v3 which contains a number of screens with different graphics to test possible issues, including a display of some static sprites.
I also came across the BlastEm emulator in C, which has a builtin debugger. I was able to modify and compile a local version which dumps out the contents of VRAM at a specific point in a ROM's execution. With this, I could verify that the data in VRAM in Moa was correct and the DMA and transfer code was in fact working correctly. I ended up not digging into the BlastEm code much beyond this, but the validation it provided was extremely helpful.
VRAM Discrepancies
The above image is the results of running TinyDemo. Clearly the text is all garbled but I haven't a clue what could be causing it. At least it was a very small ROM, with straight-forward assembly code.
The first thing I could do was to try to isolate where the problem was. Was it caused by getting
data into VRAM, or was it somewhere else. I started by running the demo in BlastEm and dumping the
VRAM at the point in the ROM just after the VDP is initialized, at address 0xDE
. I went to the same
point in Moa and again dumped the contents of VRAM to compare them.
From BlastEm, the start of VRAM where the patterns are stored looks like this:
0000: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0010: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0020: 0x0110 0x0110 0x0110 0x0110 0x0110 0x0110 0x0111 0x1110
0030: 0x0110 0x0110 0x0110 0x0110 0x0110 0x0110 0x0000 0x0000
0040: 0x0011 0x1111 0x0011 0x0000 0x0011 0x0000 0x0011 0x1110
0050: 0x0011 0x0000 0x0011 0x0000 0x0011 0x1111 0x0000 0x0000
0060: 0x0110 0x0110 0x0110 0x0110 0x0110 0x0110 0x0011 0x1100
0070: 0x0001 0x1000 0x0001 0x1000 0x0001 0x1000 0x0000 0x0000
0080: 0x0011 0x1111 0x0011 0x0000 0x0011 0x0000 0x0011 0x1110
0090: 0x0011 0x0000 0x0011 0x0000 0x0011 0x0000 0x0000 0x0000
00a0: 0x0111 0x1110 0x0001 0x1000 0x0001 0x1000 0x0001 0x1000
00b0: 0x0001 0x1000 0x0001 0x1000 0x0001 0x1000 0x0000 0x0000
00c0: 0x0011 0x0000 0x0011 0x0000 0x0100 0x0000 0x0000 0x0000
00d0: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
00e0: 0x0111 0x1110 0x0110 0x0011 0x0110 0x0011 0x0111 0x1110
00f0: 0x0110 0x1000 0x0110 0x0110 0x0110 0x0111 0x0000 0x0000
And from Moa, it looks like this:
0000: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0010: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0020: 0x0110 0x0110 0x0110 0x0110 0x0110 0x0110 0x0111 0x1110
0030: 0x0110 0x0110 0x0110 0x0110 0x0110 0x0110 0x0000 0x0000
0040: 0x0011 0x1111 0x0011 0x1111 0x0011 0x1111 0x0011 0x1111
0050: 0x0000 0x0011 0x0011 0x1111 0x0011 0x1111 0x0011 0x0011
0060: 0x1100 0x0110 0x0110 0x0110 0x0110 0x0110 0x0000 0x0000
0070: 0x1100 0x1100 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0080: 0x0011 0x1111 0x0011 0x1111 0x0011 0x1111 0x0011 0x1111
0090: 0x0000 0x0011 0x0011 0x1111 0x0011 0x1111 0x0000 0x0000
00a0: 0x1111 0x1110 0x0110 0x1100 0x0000 0x0000 0x0000 0x0000
00b0: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
00c0: 0x0011 0x1111 0x0011 0x1111 0x0011 0x0100 0x0000 0x0011
00d0: 0x1111 0x1111 0x1111 0x1111 0x1111 0x1111 0x1111 0x1111
00e0: 0x1111 0x1110 0x0110 0x0110 0x1111 0x0110 0x1111 0x0110
00f0: 0x0110 0x0110 0x1011 0x0000 0x0110 0x0111 0x1110 0x0000
It's almost the same, but if you look closely there are a few discrepancies. Of course I was
expecting it to be caused by the transfer code, but I traced the assembly for TinyDemo to see where
the data in VRAM was coming from. There's a loop that simply copied data from RAM address
0xFF0000
into VRAM address 0x0000
. I dumped the contents of RAM at that location and sure
enough, the difference occurred there too, so it was something further up the chain. Finally I was
making some progress now that I could narrow down the problems better.
Tracing back in the disassembled output quickly led to the decompress
function, which loads and
decompresses the raw binary data in the ROM into an in-memory representation that the VDP can use.
...
1f2: e24d lsrw #1,%d5 ; start of decompress loop
1f4: 40c6 movew %sr,%d6
1f6: 51cc 000c dbf %d4,0x204
1fa: 1f5d 0001 moveb %a5@+,%sp@(1)
1fe: 1e9d moveb %a5@+,%sp@
200: 3a17 movew %sp@,%d5
202: 780f moveq #15,%d4
204: 44c6 movew %d6,%ccr
206: 6404 bccs 0x20c
208: 12dd moveb %a5@+,%a1@+
20a: 60e6 bras 0x1f2 ; jump to start of outer loop
20c: 7600 moveq #0,%d3
20e: e24d lsrw #1,%d5
210: 40c6 movew %sr,%d6
212: 51cc 000c dbf %d4,0x220
216: 1f5d 0001 moveb %a5@+,%sp@(1)
21a: 1e9d moveb %a5@+,%sp@
21c: 3a17 movew %sp@,%d5
21e: 780f moveq #15,%d4
220: 44c6 movew %d6,%ccr
222: 652c bcss 0x250
224: e24d lsrw #1,%d5
226: 51cc 000c dbf %d4,0x234
22a: 1f5d 0001 moveb %a5@+,%sp@(1)
22e: 1e9d moveb %a5@+,%sp@
230: 3a17 movew %sp@,%d5
232: 780f moveq #15,%d4
234: e353 roxlw #1,%d3
236: e24d lsrw #1,%d5
238: 51cc 000c dbf %d4,0x246
23c: 1f5d 0001 moveb %a5@+,%sp@(1)
240: 1e9d moveb %a5@+,%sp@
242: 3a17 movew %sp@,%d5
244: 780f moveq #15,%d4
246: e353 roxlw #1,%d3
248: 5243 addqw #1,%d3
24a: 74ff moveq #-1,%d2
24c: 141d moveb %a5@+,%d2
24e: 6016 bras 0x266
250: 101d moveb %a5@+,%d0
252: 121d moveb %a5@+,%d1
254: 74ff moveq #-1,%d2
256: 1401 moveb %d1,%d2
258: eb4a lslw #5,%d2
25a: 1400 moveb %d0,%d2
25c: 0241 0007 andiw #7,%d1
260: 6710 beqs 0x272
262: 1601 moveb %d1,%d3
264: 5243 addqw #1,%d3
266: 1031 2000 moveb %a1@(0000000000000000,%d2:w),%d0
26a: 12c0 moveb %d0,%a1@+
26c: 51cb fff8 dbf %d3,0x266
270: 6080 bras 0x1f2 ; jump to the start of the outer loop
...
The above snippet only shows the main loop of the decompress function and not the beginning and
ending parts of the function. Instructions 0x266
and 0x26a
are where a byte of data is written
to the location in RAM where the decompressed data goes, and which will then be loaded into VRAM
verbatim.
I knew from the above dumps that the first byte that differs occurs at offset 0x46, and dumping the
registers shows the address 0xFF0000
in register %a1
, which is incremented each time the loop
occurs. To get to the point of failure, I just need to set a breakpoint at 0x266
and continue
until register %a1
contains 0xFF0046
, and then dump all the register values to look for a
difference between Moa's register values and BlastEm's.
Aha! The value of %d6
is different. Moa has 0x2710 while BlastEm has 0x2700. Looking at the
disassembly, the only use of %d6
is to temporarily hold the contents of the flags register (%ccr
which is the lower byte of status register %sr
). The flag register values are also different!
The Extend
bit, which is the 5th bit in the status register is the only difference between the two
emulators. I was already suspicious of the flags, since they are rather complicated to simulate and
can behave differently for different instructions. Of all the flags, the Extend
which isn't used
by many instructions is probably the one I'm not emulating correctly, so I seem to be on the right
track.
Stepping through the program in BlastEm shows that the Extend
flag is set after the lsrw #1,%d5
instruction, which occurs a few times in the function. The Motorola
Documentation for the LSR
instruction shows that both the Extend
flag and Carry
flag should be set to the bit value
shifted out (the least significant bit). The rust code for the LSd
instruction, which sets the
flags, is shown below.
self.set_logic_flags(pair.0, size);
if pair.1 {
self.set_flag(Flags::Carry, true);
self.set_flag(Flags::Extend, true);
}
I must have assumed that the .set_logic_flags
function would clear the Extend
flag when I
originally wrote this code, as it does for the other four flags. Most logic operations don't affect
the Extend
flag though, so the .set_logic_flags()
function is only clearing the lower 4 bits (the
Extend
flag being the 5th bit). After the call, the Extend
and Carry
flags are set to true
only if the bit shifted out, which is stored inpair.1
, is true. If the Extend
flag was set to
true from a previous instruction, it wouldn't be cleared. That was enough of a discrepancy to cause
the garbled text, and a whole lot more. Effing flags...
While the Extend
flag is never directly tested in a comparison in this function, there are some
ROXd
instructions (where d is the direction (L)eft or (R)ight). Unlike the ROd
instruction,
which rotates bits within the same value, the ROXd
instruction rotates through the Extend
flag,
so the value in Extend
will be put into the number (either the left or right end), and the bit
rotated out of the opposite end will be put into Extend
. So an error in the Extend
flag could
definitely cause some problem with the decompress
code.
Adding a line of code to clear the Extend
flag before the .set_logic_flags()
function is called
is enough to fix it. Now the text in the demo is showing legibly. It's still nothing like what it
looks like in BlastEm, which has a moving background that stretches the text vertically, but I'm
still calling it a win.
And looking at Sonic 2, it's still very garbled but Sonic is no longer falling to his death! The
Extend
flag in the shift and rotate instructions was the cause of whichever comparison lead to
Sonic not being on firm ground. I didn't even have to dig into the source of that problem in the
Sonic 2 ROM to fix it, which was a relief.
You Can't Write There, Sir
Switching gears, I tried GenTestV3, which would immediately fail when run because it attempted to
write to what should have been a read only memory area (the ROM data itself). I had added a way to
mark a MemoryBlock
as read only, which would raise an error when the .write()
function is called
on that block, as a means of catching errors. It had helped catch a few things when working on the
Macintosh support, so I had added it to the Genesis ROMs as well.
Since I was getting an error when the attempted write occurred, I knew exactly where the fault was,
address 0x2976
, and I also knew what the values of the registers at that point were:
...
292c: 7000 moveq #0,%d0
292e: 7200 moveq #0,%d1
2930: 7400 moveq #0,%d2
2932: 7600 moveq #0,%d3
2934: 7800 moveq #0,%d4
2936: 7a00 moveq #0,%d5
2938: 7c00 moveq #0,%d6
293a: 7e00 moveq #0,%d7
293c: 207c 0000 0000 moveal #0,%a0
2942: 227c 0000 0000 moveal #0,%a1
2948: 247c 0000 0000 moveal #0,%a2
294e: 287c 0000 0000 moveal #0,%a4
2954: 2a7c 0000 0000 moveal #0,%a5
295a: 2e7c 0000 0000 moveal #0,%sp
2960: 4ed6 jmp %fp@
2962: 303c 7fff movew #0x7fff,%d0
2966: 207c 00ff 0000 moveal #0xff0000,%a0
296c: 30fc 0000 movew #0,%a0@+
2970: 51c8 fffa dbf %d0,0x296c
2974: 4ed2 jmp %a2@
2976: 297c 4000 0000 movel #0x40000000,%a4@(4) ; invalid write here
297c: 0004
297e: 383c 7fff movew #0x7fff,%d4
2982: 38bc 0000 movew #0,%a4@
2986: 51cc fffa dbf %d4,0x2982
298a: 4ed2 jmp %a2@
...
And the register values:
Breakpoint reached: Attempt to write to read-only memory at 4 with data [64, 0]
@ 18201056 ns
0x0000297e: 383c 7fff
movew #00007fff, %d4
Status: Running
PC: 0x0000297e
SR: 0x2700
D0: 0x00000000 A0: 0x00000000
D1: 0x00000000 A1: 0x00000000
D2: 0x00000000 A2: 0x00002592
D3: 0x00000000 A3: 0x00000000
D4: 0x00000000 A4: 0x00000000
D5: 0x00000000 A5: 0x00000000
D6: 0x00000000 A6: 0x00002588
D7: 0x00000000
SSP: 0x00000000
USP: 0x00000000
Current Instruction: 0x0000297e MOVE(Immediate(32767), DirectDReg(4), Word)
0x00000000: 0x00ff 0xfffe 0x0000 0x0200 0x0000 0x30e2 0x0000 0x30ee
0x00000010: 0x0000 0x3076 0x0000 0x308e 0x0000 0x309a 0x0000 0x30a6
0x00000020: 0x0000 0x30b2 0x0000 0x30be 0x0000 0x30ca 0x0000 0x30d6
0x00000030: 0x0000 0x306a 0x444f 0x4e27 0x5420 0x4c4f 0x4f4b 0x2041
The register %a4
contains 0x00000000
, plus an offset of 4, so it's trying to write to address
0x00000004
, the reset vector. That can't possibly be right. In BlastEm, I tried setting the same
address as a breakpoint and, would you look at that, the breakpoint isn't reached! That code isn't
even running in BlastEm when GenTest is run. If you notice from the snippet above, jmp
instructions are being used to return to the calling function, and bra
nch instructions are being
used to call them, rather than using the stack. So the return address is not on the stack, but in
register %a2
, which contains 0x2592
which is the instruction after the one that called this
function. We're on to something here.
256c: 6700 dcae beqw 0x21c
2570: 60d8 bras 0x254a
2572: 7400 moveq #0,%d2
2574: 3e7c 0000 moveaw #0,%sp
2578: 2c7c 0000 0000 moveal #0,%fp
257e: 4df9 0000 2588 lea 0x2588,%fp
2584: 6000 03a6 braw 0x292c ; jump to a different function (shown above)
2588: 45f9 0000 2592 lea 0x2592,%a2
258e: 6000 03e6 braw 0x2976 ; jump to the troublesome function
2592: 297c 6000 0002 movel #1610612738,%a4@(4)
2598: 0004
259a: 4bf9 0000 8156 lea 0x8156,%a5
Address 0x258e
contains a branch instruction to the exact address that the erroneous write occurs
on, and before that, the return register %a2
is loaded with the return value. What about the
instruction before that? It's a branch to 0x292c
which appears in the previous snippet, which
seems to be a function that sets all the register values to 0
! Wait, why would it do that? The
register values were almost all 0
when the error occurred, except for the two registers used as
return values, so it did run that code, but why would it clear everything just before using an
now zero'd register as an address.
I set a breakpoint for 0x2572
, which looked like the start of the current function, given that
there's a branch instruction just before. The %a4
register, interestingly enough, contains
0xc00000
, which would make sense as the intended value of %a4
where the erroneous write
occurred, if all the registers hadn't been cleared just before. Most of the other registers are 0
except for %a2
which contains 0x2554
, possibly the return value of the caller.
...
253e: 297c 6000 0002 movel #1610612738,%a4@(4)
2544: 0004
2546: 6000 0396 braw 0x28de
254a: 45f9 0000 2554 lea 0x2554,%a2
2550: 6000 04f8 braw 0x2a4a
2554: 1e03 moveb %d3,%d7
2556: 0007 00ef orib #-17,%d7
255a: 0c07 00ef cmpib #-17,%d7 ; the value of %d7 should
; be 0xff, but it's 0xef
255e: 6700 0012 beqw 0x2572 ; this is where the problem
; occurs (shouldn't jump but does)
2562: 1e03 moveb %d3,%d7
2564: 0007 00bf orib #-65,%d7
2568: 0c07 00bf cmpib #-65,%d7
256c: 6700 dcae beqw 0x21c
2570: 60d8 bras 0x254a
...
There's a jump to the start of our function that shouldn't run at 0x255e
, which... isn't quite
what I was expecting. I was somehow expecting the previous code to somehow make sense, but alright,
it's maybe taking a jump that shouldn't happen (even though it seems like it should never ever
happen), so why is it jumping when it shouldn't.
I set a breakpoint for 0x2554
in both emulators to see if that code would run and this time,
BlastEm runs that code. Stepping through the code in both emulators shows the status register
values are different just after the comparison at 0x255a
. groan Not the flags again.
Looking closer at the code though, the values of %d7
are different between the emulators as well.
The comparison in Moa is setting the flags correctly for the data used, but the data values are
different, and so BlastEm doesn't make the branch where Moa does. Ok, so maybe it's not the flags
this time. So why are the values of %d7
different. Well it's set just a few instructions ahead
with the lower byte value of %d3
, which in Moa is 0
. In BlastEm, it's 0xff. Aha! So where is
%d3
set?
It's not set in the code just above the comparison, but there is a branch to 0x2a4a
just before
which looks like a register-returning function call, and the code at that location does change
%d3
.
2a4a: 7600 moveq #0,%d3
2a4c: 7e00 moveq #0,%d7
2a4e: 13fc 0040 00a1 moveb #0x40,0xa10009
2a54: 0009
2a56: 13fc 0040 00a1 moveb #0x40,0xa10003
2a5c: 0003
2a5e: 4e71 nop
2a60: 4e71 nop
2a62: 1639 00a1 0003 moveb 0xa10003,%d3
2a68: 0203 003f andib #0x3f,%d3
2a6c: 13fc 0000 00a1 moveb #0,0xa10003
2a72: 0003
2a74: 4e71 nop
2a76: 4e71 nop
2a78: 1e39 00a1 0003 moveb 0xa10003,%d7
2a7e: 0207 0030 andib #0x30,%d7
2a82: e50f lslb #2,%d7
2a84: 8607 orb %d7,%d3
2a86: 4ed2 jmp %a2@
Tracing through the debuggers shows that this is the code where BlastEm gets 0xff into register
%d3
and it's doing it by reading the controller input. 0xa10003
is the byte address of the data
port for controller 1, and 0xa10009
is the control port for controller 1. I had taken a stab at
implementing the weird TH counting that
the controllers need to do, but I hadn't tested it. I had only hooked up the Start button to a key
press, which was all I had needed up until this point, to get through the title screen to the game
play.
Here, from the code, it seemed as if the correct behaviour, at least according to how BlastEm worked,
was for the controllers to return 0xff
when no buttons are pressed, rather than 0
. Changing
that one thing is Moa got to the first screen of GenTest asking which test to run! Success! Well,
I still needed to fix the controllers properly, since button presses still didn't work, but this is
at least the cause of GenTest not running.
There turned out to be quite a few minor bugs in the TH counting code. The count was incrementing twice as often as it should have, and the button states needed to be inverted (1 means the button is not pressed and 0 means it is). I also needed to reset the counter when the control port was written to, for the count to be in sync with what the ROM was expecting. Not all ROMs progressed through the entire count, if they only needed to read a few buttons. Eventually I got it sorted out and buttons were working but it took a while to get them right. The latest code for the controllers is here
Fixing Sprites
I had been back at it for about 4 or 5 days now and I had already ticked off two major issues. I could now control the characters in game play, even though I couldn't see much of what was going on still. The elephant in the room was those sprites not working, so with my enthusiasm high, I pressed on to tackle the sprites.
Fixing the Extend
flag bug fixed Sonic falling through the floor to his death, so that was a
significant step forward, but multi-cell sprites were still being drawn incorrectly. Luckily the
GenTest ROM has a page that displays a static multi-pattern sprite, both forward and reversed.
The forward sprite (Knuckles) works fine, but the reversed sprite (Sonic) is messed up. If you look closely, you can see the vertical columns of cells seem to line up correctly, but the horizontal arrangement of the columns is mixed up. This one turned out to be a bit subtle.
I had tried fiddling with reversing the cell drawing order in multicell sprites but to no avail. It turned out when switching the revere-sprite code I was changing both the order of the cells, and also reversing the positions they were drawn in, rather than switching only one. I was also adjusting both coordinates instead of just the horizontal arrangement. At the time I didn't have a way of just drawing one sprite in one location to inspect it closely enough to figure out what was wrong, but the GenTest ROM made it much clearer what was wrong. I also had an off by one error with reversed sprites where I needed to subtract one from the size in order to get the right vertical row of patterns to use.
First, the existing code is shown below. Note: Multi-cell sprites are drawn top to bottom, left to right, unlike everything else in the Genesis, so the outer loop is for the horizontal direction, and the inner loop is the vertical direction. The variables that appear are defined as follows:
pattern_name
is the 16-bit pattern specifier(h_pos, v_pos)
is the pixel position on screen where the sprite should be drawn(size_h, size_v)
is the size in cells of the sprite(h_rev, v_rev)
are bools of whether the sprite should be reversed in a given directionself.is_sprite_on_screen(x, y)
returns whether those pixel positions are on-screen (sprites can be entirely off the screen, in which case they wont be drawn)
for ih in 0..size_h {
for iv in 0..size_v {
let h = if !h_rev { ih } else { size_h - ih };
let v = if !v_rev { iv } else { size_v - iv };
let (x, y) = (h_pos + h * 8, v_pos + v * 8);
if self.is_sprite_on_screen(x, y) {
let iter = self.get_pattern_iter(
(pattern_name & 0xF800)
| ((pattern_name & 0x07FF) + (h * size_v) + v)
);
frame.blit(x as u32 - 128, y as u32 - 128, iter, 8, 8);
}
}
}
Changing the following lines is enough to fix it. It needs to take an extra 1 off the h and v values when the sprite is reversed, and also use the loop's values to calculate the position where the cell should be drawn instead of using the previously calculated cell positions, which have already been reversed.
let h = if !h_rev { ih } else { size_h - 1 - ih };
let v = if !v_rev { iv } else { size_v - 1 - iv };
let (x, y) = (h_pos + ih * 8, v_pos + iv * 8);
And now the sprites work! That was surprisingly simple given how broken they looked before. I had been close, but it only takes an off by one error to make the output mangled beyond recognition sometimes.
The intro sprites in Earthworm Jim are working now too. I had tried to use that game for testing sprites before I had taken that break, but it wasn't as helpful as the GenTest sprite screen.
Not All The Data
How is Sonic 2 looking now that the sprites have been fixed.
Well... it honestly doesn't look any different. In fact this is the same image from after the
Extend
flag was fixed, but before the sprites were fixed. I literally could not tell the
difference between the image before and after fixing the sprites, they were so identical, so I
didn't even bother adding another screenshot. No wonder I couldn't fix the sprites before, when I
was using Sonic 2 to test with. The garbled sprites in Sonic 2 were caused by something else
entirely.
Are there any other test screens in the GenTest ROM that looked messed up? Sure enough, all the video output patterns are broken. I'll use the colour bleed test as an example.
Well that's definitely not what it should look like. Inspecting the VRAM shows shows that only about half the data is loaded that should be loaded by comparison to BlastEm. I found the spot in the ROM where the data is loaded into the VDP using a DMA transfer. The source data in RAM actually is complete this time, even though the VRAM data is only partially present, so this time it is an issue with transferring data into the VDP. Playing around with the debugger in BlastEm I noticed something in the output for the VDP state:
**DMA Group**
13: 00 |
14: 46 | DMA Length: $4600 words
15: 00 |
16: 88 |
17: 7F | DMA Source Address: $FF1000, Type: 68K
It says the DMA length is 0x4600 words (not bytes). Crap... I had assumed that the DMA count was in bytes, not words. Could it really be that simple a problem? Yup...
I was subtracting 2 instead of 1 from the count every iteration of the DMA loop, causing it to end half way through the intended transfer size. It really was that simple
And now Sonic 2 looks like this:
Much better! It almost looks right except for the foreground that's out of place. I haven't even attempted to implement the horizontal and vertical scrolling functionality of the VDP yet, so that must be what's going on. This is finally coming together.
Scrolling The Scrolls
It had been less than a week since I had returned to it, and I had fixed all the glaring issues that were mangling the graphics. It was finally time to implement something new from the Sega docs, that I had left for later. Later was now! It was time to implement the scrolling features.
As mentioned before, the scrolls are much bigger than can fit on the screen at once. In order to be able to quickly update what's shown on the screen without changing all the cell data, the scroll planes can be moved relative to the screen to change what part of the scroll plane will appear on the screen. Each scroll can be moved independently of each other to create a parallax effect.
The vertical and horizontal scrolling work a bit differently from each other. For one, the vertical scroll direction has its own special memory, the VSRAM, where as the horizontal scroll data is stored in a table in VRAM, with the starting address of the table set by a VDP register.
For the vertical scroll position, either a single offset can be used to move the whole plane, or every two cells can have a different vertical offset. Each offset is an unsigned number between 0 and 1023 (which is the maximum number of pixel of the largest possible scroll size of 128 cells). Since VSRAM is 80 bytes, that means there can be 40 16-bit words, 20 for each of the two scrolls interleaved with each other, which covers the maximum 40 cell width of the screen.
For the horizontal scroll position, either a single offset can move the whole plane, or every cell can have a different offset, or every line can have a different offset. For the cell offset setting, only a maximum of 30 offsets for each scroll are needed, but they are stored in a table with the same size as used by the per-line scrolling mode. The per-line scrolling mode needs 896 bytes for the NTSC version's 224 line output (960 bytes for the full 240 line resolution of PAL). Like the vertical offsets, each offset is a 16-bit word and ranges from 0-1023, and the offsets for Scroll A and Scroll B are interleaved in the horizontal scroll table.
pub fn get_hscroll(&self, hcell: usize, line: usize) -> (u32, u32) {
let scroll_addr = match self.mode_3 & MODE3_BF_H_SCROLL_MODE {
0 => self.hscroll_addr,
2 => self.hscroll_addr + (hcell << 5),
3 => self.hscroll_addr + (hcell << 5) + (line * 2 * 2),
_ => panic!("Unsupported horizontal scroll mode"),
};
let scroll_a = read_beu16(&self.vram[scroll_addr..]) as u32 & 0x3FF;
let scroll_b = read_beu16(&self.vram[scroll_addr + 2..]) as u32 & 0x3FF;
(scroll_a, scroll_b)
}
pub fn get_vscroll(&self, vcell: usize) -> (u32, u32) {
let scroll_addr = match (self.mode_3 & MODE3_BF_V_SCROLL_MODE) {
0 => 0,
_ => vcell >> 1,
};
let scroll_a = read_beu16(&self.vsram[scroll_addr..]) as u32 & 0x3FF;
let scroll_b = read_beu16(&self.vsram[scroll_addr + 2..]) as u32 & 0x3FF;
(scroll_a, scroll_b)
}
There are some weird glitches in Scroll B but Scroll A seems to work fine. It only moves a whole cell at a time, so Scroll A appears jerky compared to the sprites. It's especially noticeable at the edge of the bridge. The bridge is made of sprites, which can be positioned to the exact pixel, but the ground where the bridge is supposed to be attached to will only move when a whole cell has changed.
Fixing Line Scrolling
It had been about a week and a half since I took up the Genesis again. With the help of the test ROMs and BlastEm, I had made pretty quick work of a whole bunch of little bugs, going from what was still a very garbled output to having the games playable. I wasn't done yet though.
After spending a week working on Computie when my new PCBs arrived, I returned to the Genesis to work on the per-line scrolling. I also dabbled a bit with audio support, adding a dummy device for the YM2612 audio FM synthesizer chip, which is mapped to the Z80 coprocessor's address space, and fixing the Z80 banked memory area, so that it could access the 68k ROM or RAM data. With that, I was able to get the Z80 coprocessor working well enough that Sonic 1 would get past the title screen and into the game.
I was bothered that per-line scrolling wasn't working, and that the scrolls moved in a jerky
fashion. I needed to fix it but it would require more than a few simple changes. Since the
per-cell scrolling was working, I chose to write a completely different version of the
draw_scrolls
function just for per-line scrolling. I could integrate them later if possible but
it would be easier to completely rewrite it without breaking what I already had.
I was still hoping to use the pattern iterator I had written, but I would need to change it to take the line number on initialization, so that I could output only one line of a pattern at a time. I then used another loop inside the horizontal and vertical cell loops to iterate over each of the 8 lines in a pattern, using a different offset for each line of the pattern.
My first attempt used the same loop to draw both scrolls at once, but the results were this:
There is clearly an issue caused by the scrolling since moving until the screen is on a cell boundary shows the foreground plane (Scroll A) completely on top of the background plane (Scroll B), but when the offset is between cells, Scroll B is getting drawn on top. Separating the drawing of each scroll (at the cost of duplicating the loops) fixed this problem, but there is still an issue with these strange black artifacts showing on the screen.
It took me a while of fussing around with the code before I realized that I had the line and column coordinates backwards when passing them to the scroll fetching functions. That's a little embarrassing. I was sending the cell_x value to the horizontal scroll offset when it should have been getting the cell_y value (ie. the horizontal offset is based on what line is currently being drawn, so you give it the line number and it gives you the x offset). Swapping these around and reorganizing the loops fixed this. Now the scrolling is smooth!
Rewriting
There were still some issues with the left hand and bottom edges of the screen where the foreground
is not drawn to the edge because the scroll offset is not on a cell boundary. Changing the existing
code to add an extra cell was not as trivial as it would appear. Shifting the cells over caused the
sprites to be misaligned with the background, and starting the iterators one cell early would mean
starting at -1
, which would require changing to signed numbers, and possibly calculating an
invalid offset due to the presence of negatives, or adding many checks to prevent that.
I also didn't have drawing priority working because I didn't have all the cell and sprite priority bits calculated at the same time, to determine which to display, and the code was awfully messy at this point. It was time to rewrite all the display code. I had learned so much and run into so many issue by this point. I had a better understanding of how it was all supposed to work now, and I could incorporate all those lessons in the next version.
In order to recreate the video output more accurately, I opted to more faithfully simulate what the hardware VDP would be doing. Since it's generating a video signal on the fly, it draws the image pixel by pixel, line by line, exactly in step with the CRT. If I did it this way, it would also allow me to implement the priority bits to decide on which pixel from the different planes should be drawn to the screen, since everything would be in the same loop. There would be a lot more duplicated calculations and slower performance as a result, but since the existing performance wasn't an issue, it should still be fast enough to emulate at full speed.
To make it easier to debug in the short term, I duplicated the code to calculate the cell indices for the scrolls. Later, I can break this up into multiple functions to reuse code, and also store some of the calculated values across iterations to avoid recalculating, but I wanted everything in one loop to make it easier to adjust while I debugged it. I did break out the vertical drawing loop from the horizontal one, which will eventually be used to step through the drawing line by line, instead of drawing the whole frame before the vertical interrupt, but this isn't yet implemented.
pub fn draw_frame(&mut self, frame: &mut Frame) {
self.build_sprites_lists();
for y in 0..(self.screen_size.1 * 8) {
self.draw_frame_line(frame, y);
}
}
pub fn draw_frame_line(&mut self, frame: &mut Frame, y: usize) {
let bg_colour = ((self.background & 0x30) >> 4, self.background & 0x0f);
let (hscrolling_a, hscrolling_b) = self.get_hscroll(y / 8, y % 8);
for x in 0..(self.screen_size.0 * 8) {
let (vscrolling_a, vscrolling_b) = self.get_vscroll(x / 8);
let pixel_b_x = (x - hscrolling_b) % (self.scroll_size.0 * 8);
let pixel_b_y = (y + vscrolling_b) % (self.scroll_size.1 * 8);
let pattern_b_addr = self.get_pattern_addr(self.scroll_b_addr, pixel_b_x / 8, pixel_b_y / 8);
let pattern_b_word = self.memory.read_beu16(Memory::Vram, pattern_b_addr);
let priority_b = (pattern_b_word & 0x8000) != 0;
let pixel_b = self.get_pattern_pixel(pattern_b_word, pixel_b_x % 8, pixel_b_y % 8);
let pixel_a_x = (x - hscrolling_a) % (self.scroll_size.0 * 8);
let pixel_a_y = (y + vscrolling_a) % (self.scroll_size.1 * 8);
let pattern_a_addr = self.get_pattern_addr(self.scroll_a_addr, pixel_a_x / 8, pixel_a_y / 8);
let pattern_a_word = self.memory.read_beu16(Memory::Vram, pattern_a_addr);
let mut priority_a = (pattern_a_word & 0x8000) != 0;
let mut pixel_a = self.get_pattern_pixel(pattern_a_word, pixel_a_x % 8, pixel_a_y % 8);
if self.window_addr != 0 && self.is_inside_window(x, y) {
let pixel_win_x = x - self.window_pos.0.0 * 8;
let pixel_win_y = y - self.window_pos.0.1 * 8;
let pattern_win_addr = self.get_pattern_addr(self.window_addr, pixel_win_x / 8, pixel_win_y / 8);
let pattern_win_word = self.memory.read_beu16(Memory::Vram, pattern_win_addr);
// Scroll A is not displayed where ever the Window is displayed, so we replace Scroll A's data
priority_a = (pattern_win_word & 0x8000) != 0;
pixel_a = self.get_pattern_pixel(pattern_win_word, pixel_win_x % 8, pixel_win_y % 8);
};
let mut pixel_sprite = (0, 0);
let mut priority_sprite = false;
for sprite_num in self.sprites_by_line[y].iter() {
let sprite = &self.sprites[*sprite_num];
let offset_x = x as i16 - sprite.pos.0;
let offset_y = y as i16 - sprite.pos.1;
if offset_x >= 0 && offset_x < (sprite.size.0 as i16 * 8) {
let pattern = sprite.calculate_pattern(offset_x as usize / 8, offset_y as usize / 8);
priority_sprite = (pattern & 0x8000) != 0;
pixel_sprite = self.get_pattern_pixel(pattern, offset_x as usize % 8, offset_y as usize % 8);
if pixel_sprite.1 != 0 {
break;
}
}
}
let pixels = match (priority_sprite, priority_a, priority_b) {
(false, false, true) => [ pixel_b, pixel_sprite, pixel_a, bg_colour ],
(true, false, true) => [ pixel_sprite, pixel_b, pixel_a, bg_colour ],
(false, true, false) => [ pixel_a, pixel_sprite, pixel_b, bg_colour ],
(false, true, true) => [ pixel_a, pixel_b, pixel_sprite, bg_colour ],
_ => [ pixel_sprite, pixel_a, pixel_b, bg_colour ],
};
for i in 0..pixels.len() {
if pixels[i].1 != 0 || i == pixels.len() - 1 {
let mode = if pixels[i] == (3, 14) {
ColourMode::Highlight
} else if (!priority_a && !priority_b) || pixels[i] == (3, 15) {
ColourMode::Shadow
} else {
ColourMode::Normal
};
frame.set_pixel(x as u32, y as u32, self.get_palette_colour(pixels[i].0, pixels[i].1, mode));
break;
}
}
}
}
#[inline(always)]
fn get_pattern_addr(&self, cell_table: usize, cell_x: usize, cell_y: usize) -> usize {
cell_table + ((cell_x + (cell_y * self.scroll_size.0 as usize)) << 1)
}
fn get_pattern_pixel(&self, pattern_word: u16, x: usize, y: usize) -> (u8, u8) {
let pattern_addr = (pattern_word & 0x07FF) << 5;
let palette = ((pattern_word & 0x6000) >> 13) as u8;
let h_rev = (pattern_word & 0x0800) != 0;
let v_rev = (pattern_word & 0x1000) != 0;
let line = if !v_rev { y } else { 7 - y };
let column = if !h_rev { x / 2 } else { 3 - (x / 2) };
let offset = pattern_addr as usize + line * 4 + column;
let second = x % 2 == 1;
let value = if (!h_rev && !second) || (h_rev && second) {
(palette, self.memory.vram[offset] >> 4)
} else {
(palette, self.memory.vram[offset] & 0x0f)
};
value
}
fn build_sprites_lists(&mut self) {
let sprite_table = self.sprites_addr;
let max_lines = self.screen_size.1 * 8;
self.sprites.clear();
self.sprites_by_line = vec![vec![]; max_lines];
let mut link = 0;
loop {
let sprite = Sprite::new(&self.memory.vram[sprite_table + (link * 8)..]);
let start_y = sprite.pos.1;
for y in 0..(sprite.size.1 as i16 * 8) {
let pos_y = start_y + y;
if pos_y >= 0 && pos_y < max_lines as i16 {
self.sprites_by_line[pos_y as usize].push(self.sprites.len());
}
}
link = sprite.link as usize;
self.sprites.push(sprite);
if link == 0 {
break;
}
}
}
Finally... It's working pretty good, it scrolls smoothly, it sorts out the priority so Sonic appears behind the trees. It works better than this gif even shows. I recorded it at 15 frames a second instead of 30 or 60, to keep the file size small, so when Sonic gets his fast boots, it seems like the sprite isn't animated, but it's actually just moving too fast to be recorded.
For those who are curious, out of each 16.6ms interval between updating a frame, the old display code was running in around 2ms, and the new code is running in around 6ms, so the new code is significantly slower (but still well within the time available). This is in part because I'm calculating which cell to draw for each of the planes, and fetching the scroll values, for each pixel on the screen. This could be improved upon by storing the pattern data for the current cells for each plane between iterations and only updating them when the cell changes. That said, doing so will only make a small improvement in performance, while also making the code harder to read.
Conclusion
This project definitely turned into more than I was expecting when I started. I had hoped to get some pretty graphics after only a few weeks of work, (the initial implementation only took about that long), but that didn't happen and it quickly became my white whale. I had to finish it. The real journey was the eight weeks of switching between debugging and working on other projects while the problems percolated in the back of my brain. But I did it. I got it to a playable (albeit still buggy) state.
Special thanks to ComradeOj for the demo ROMs, and Mike Pavone and the other contributors for BlastEm (github mirror). Without these, it would have taken a lot more time to get this working.
There is still a lot to do, and I will likely work on this project on and off for a while to come. Audio needs to be added, and a lot of games don't quite run correctly because of one reason or another. Thanks for joining me and I hope you learned something as well, or at least got to enjoy some nostalgic thoughts of the Sega Genesis. If there's anything you'd like to me to write more about or you have any feedback about these posts, I'd love to hear it on twitter or by email. Happy Emulating!