Making a 68000 Emulator in Rust

Also available on dev.to

Written November 2021 by transistor_fet

A few months ago, I was looking for a project to do while recovering from a bout of illness. I needed something that was straight-forward enough to work on without getting stuck on tough decisions or the need to learn a lot before diving in, which was the case with all the other projects on my plate at the time. My girlfriend suggested writing an emulator, and my first thought was to try emulating the computer I made last year, Computie, since I already had a fair amount of code for it that I knew well, and the 68000 architecture and assembly language was still fresh in my mind. Naturally I chose to write it in Rust.

I've worked on different projects that have some vague similarities to emulators, such as programming language interpreters and artificial life simulations but I haven't actually tried to make an emulator before. I'm not aiming to make a fast emulator, and since this is meant to be fun, I'm not as concerned about accuracy (specifically instruction execution time accuracy). I would, however, like something that's flexible enough to experiment with different hardware designs. Perhaps I could use this with some of my future hardware designs to test out hardware configurations and to develop and test the software that runs on them. It would also be nice to emulate some vintage computers that use the 68000, which each have their own I/O devices that would need simulating. With that in mind, we should work towards making independent components for each device, which interact in regular ways and can be combined in different configurations without the need to modify the components themselves (ie. no tight coupling of components)

I chose Rust because it's currently my favorite systems language. I've used C and C++ quite a bit as well, but Rust's advanced type system is much more pleasant to work with, and the compile-time checks means I can focus less on simple bugs and more on the problem at hand, getting more done in the same amount of time. I'm assuming here some familiarity with Rust, as well as the basic principles of how a computer works, but not necessarily that much about the 68000 or emulators. We will start with some code to simulate the computer memory, creating an abstract way of accessing data. We'll then implement the NOP (no operation) instruction, the simplest possible instruction, for the 68000, and expand the implementation from there. Once we've created a way of handling the passage of time for the CPUs and I/O devices, we'll implement a simple serial port controller which will act as our primary I/O device. From there we'll make all of our simulated devices, represented as struct objects in Rust, look the same so that we can treat them the same, regardless of what they represent internally. We'll then be able to package them up into a single working system and set them in motion to run the Computie software from binary images.

The Computer
The 68000
The 68681 Serial Port Controller
The Memory
Simulating The CPU
Adding Instructions
Abstracting Time
Some I/O
Box It Up
An Addressable of Addressables (The Data Bus)
A Happy Little System
Tying It All Together
Now What

The Computer

Computie is a single board computer with a Motorola 68010 CPU connected to 1MB of RAM, some flash, and an MC68681 dual serial port controller, which handles most of the I/O. (The 68010 is almost identical to the 68000 but with some minor fixes which won't affect us here. I'll mostly be referring to the common aspects of both processors). One of the serial connections is used as a TTY to interact with either the unix-like Computie OS, or the monitor software that's normally stored in the flash chip. It also supports a CompactFlash card and SLIP connection over the other serial port for internet access, but we won't cover those here. In order to get a working emulator, we'll focus on just the CPU, memory, and MC68681 controller.

The 68000

The 68000 is 32-bit processor with a 16-bit data bus and 16-bit arithmetic logic unit. It was used on many early computers including the early Macintosh series, the early Sun Microsystems workstations, the Amiga, the Atari ST, and the Sega Genesis/Mega Drive, just to name a few. It was almost chosen for the IBM PC as well, but IBM wanted to use an inexpensive 8-bit data bus in the PC, and the 68008 (with an 8-bit data bus) wasn't available at the time. The 8088, with it's 8-bit bus, was available however, and we have been stuck with that decision ever since.

The 68000 has 8 32-bit general purpose data registers, and 7 32-bit general purpose address registers plus two stack registers which can be accessed as the 8th address register depending on whether the CPU is in Supervisor mode or User mode. Internally the address registers use a separate bus and adder unit from the main arithmetic logic unit, which only operates on the data registers. This affects which instructions can be used with which registers, and operations on address registers don't always affect the condition flags (which has caused me many troubles both when writing the Computie software, and with the implementation of instructions here).

A 16-bit status register is used for condition codes and for privileged flags like the Supervisor mode enable flag and the interrupt priority level. Only the lower 8 bits can be modified from User mode. The conditions flags are set by many of the instructions based on their results, and can be checked by the conditional jump instructions to branch based on the results of comparisons.

The program counter register keeps track of the next instruction to be executed. Instructions are always a multiple of 2-byte words, and the CPU uses big endian byte order, so the most significant byte will always be in the lower byte address in memory. As an example, the NOP instruction uses the opcode 0x4E71, where 0x4E would be the first byte in memory followed by 0x71. A longer instruction like ADDI can have instruction words after the opcode (in this case the immediate data to add). For example, addil #0x12345678, %d0 which adds the hex number 0x12345678 to data register 0 would be encoded as the sequence of words [0x0680, 0x1234, 0x5678]. The opcode word, 0x0680, has encoded in it that it's an ADDI instruction, that the size of the operation is a long word (32-bit), and that it should use data register 0 as both the number to add to, and the destination where the result will be stored. (Note: 68000 assembly language instructions move data from the left hand operand to the right hand operand, unlike Intel x86 assembly language which uses the reverse).

A vector table is expected at address 0 which contains an array of up to 256 32-bit addresses. The first address contains the value to be loaded into the Supervisor stack register upon reset, and the remaining addresses are the value loaded into the program counter when a given exception occurs. This table cannot be relocated on the 68000, but it can be changed after reset on the 68010. Computie uses the 68010 for this exact feature, so that the OS can put the vector table in RAM, but the monitor software doesn't use interrupts. We won't cover interrupts here so this feature isn't needed for now and we can focus only on emulating the 68000. As for the vector table, we just need to simulate how the processor starts up on power on or after a reset, in which case the vector table is always at address 0, and the first two long words are the stack pointer and initial program counter respectively.

The 68681 Serial Port Controller

The MC68681 is a peripheral controller designed specifically for use with the 68000 series. It has two serial channels, A and B, as well as an onboard 16-bit timer/counter, 8 general purpose output pins, and 6 general purpose input pins. Internally it has 16 registers which can be read or written to from the CPU. The behaviour of the 16 registers is sometimes different between reading and writing, such as the status registers (SRA & SRB) which indicate the current status of the serial ports when read, and the clock select registers (CSRA & CSRB) which configure the serial port clocks when writing to the same address as the status registers. It can also generate an interrupt for one of 8 different internal conditions, such as data ready to read on a given channel, or the timer reaching zero. It uses it's own clock signal to generate the serial clocks and to count down with, so it will need to run some code along side the CPU to simulate its internal state.

The Memory

Now that we have a bit of background on the devices we'll be emulating, lets start making the emulator. The first thing we'll need in order to emulate our computer is a way of accessing and addressing memory where instruction data can be read from. We need to eventually have a common way of reading and writing to either simulated ram, or simulated I/O devices. Since we want to keep things generic and interchangeable, using Rust enums for the different devices in the system would be too tightly-coupled. Traits it is, so we'll start with an Addressable trait.

type Address = u64;     // I (wishfully) chose u64 here in case
                        // I ever emulate a 64-bit system

pub trait Addressable {
    fn len(&self) -> usize;
    fn read(&mut self, addr: Address, data: &mut [u8]) -> Result<(), Error>;
    fn write(&mut self, addr: Address, data: &[u8]) -> Result<(), Error>;
    ...
}

I went through a few different iterations of this, especially for the .read() method. At first I returned an iterator over the data starting at the address given, which works well for simulated memory, but reading from an I/O device could return data that is unique to when it was read. For example, when reading the next byte of data from a serial port, the data will be removed from the device's internal FIFO and returned, and since at that point it won't be stored anywhere, we can't have a reference to it. We need the data (of variable length) to be owned by the caller when the method returns, and passing in a reference to a mutable array to hold that data is a simple way to do that.

We can also add some default methods to the trait which will make it easier to access multi-byte values. The example here only shows methods to read and write 16 bit numbers in big endian byte order but there are also similar methods for 32-bit numbers and for little endian numbers.

pub trait Addressable {
    ...
    fn read_beu16(&mut self, addr: Address) -> Result<u16, Error> {
        let mut data = [0; 2];
        self.read(addr, &mut data)?;
        Ok(read_beu16(&data))
    }

    fn write_beu16(&mut self, addr: Address, value: u16) -> Result<(), Error> {
        let mut data = [0; 2];
        write_beu16(&mut data, value);
        self.write(addr, &data)
    }
    ...
}

#[inline(always)]
pub fn read_beu16(data: &[u8]) -> u16 {
    (data[0] as u16) << 8 |
    (data[1] as u16)
}

#[inline(always)]
pub fn write_beu16(data: &mut [u8], value: u16) -> &mut [u8] {
    data[0] = (value >> 8) as u8;
    data[1] = value as u8;
    data
}

Now for some simulated ram that implements our trait. (I'm leaving out the .new() methods for most of the code snippets because they are pretty straight-forward, but you can assume they exist, and take their field values as arguments, or set their fields to 0).

pub struct MemoryBlock {
    pub content: Vec<u8>,
}

impl Addressable for MemoryBlock {
    fn len(&self) -> usize {
        self.contents.len()
    }

    fn read(&mut self, addr: Address, data: &mut [u8]) -> Result<(), Error> {
        for i in 0..data.len() {
            data[i] = self.contents[(addr as usize) + i];
        }
        Ok(())
    }

    fn write(&mut self, addr: Address, data: &[u8]) -> Result<(), Error> {
        for i in 0..data.len() {
            self.contents[(addr as usize) + i] = data[i];
        }
        Ok(())
    }
}

Simulating The CPU

With just the Addressable trait, we can start simulating the CPU. Each cycle of the CPU involves reading in instruction data, decoding it, executing the instruction, modifying the stored state of the CPU, and finally checking for interrupts or breakpoints before looping again. We don't need all of this to start though, so first lets make some CPU state so that we at least have a PC (program counter) to keep track of the instruction data we've read.

#[derive(Copy, Clone, Debug, PartialEq)]
pub enum M68kStatus {
    Init,
    Running,
    Stopped,
}

#[derive(Clone, Debug, PartialEq)]
pub struct M68kState {
    pub status: M68kStatus,

    pub pc: u32,            // Program Counter
    pub sr: u16,            // Status Register
    pub d_reg: [u32; 8],    // Data Registers
    pub a_reg: [u32; 7],    // Address Registers
    pub ssp: u32,           // Supervisor Stack Pointer
    pub usp: u32,           // User Stack Pointer
}

pub struct M68k {
    pub state: M68kState,
    ...
}

(I've separated the state into its own struct, separate from the M68k struct, in part to make it cleaner, but mostly to make it easier to test. We can get a complete known state using just M68kState::new(), and we can clone, modify, and compare states because we've derived PartialEq, so after running a test on a given instruction, we only need one assert_eq!(cpu.state, expected_state) to check if the resulting state is what we expected).

Next we need to initialize the CPU. When the 68000 is reset, it first reads in the stack pointer and initial value of the PC register from the beginning of memory where the vector table is located.

impl M68k {
    ...
    pub fn init(&mut self, memory: &mut dyn Addressable) -> Result<(), Error> {
        self.state.ssp = memory.read_beu32(0)?;
        self.state.pc = memory.read_beu32(4)?;
        self.state.status = M68kStatus::Running;
        Ok(())
    }
    ...
}

Since the decoding of instructions can be a bit convoluted, and since speed is not our utmost goal, it's cleaner to decode the instructions fully into some kind of internal representation, and then execute them based on that representation. For now, lets just add a NOP instruction.

#[derive(Clone, Debug)]
pub enum Instruction {
    NOP,
}

impl M68k {
    pub fn read_instruction_word(&mut self, memory: &mut dyn Addressable) -> Result<u16, Error> {
        let ins = memory.read_beu16(self.state.pc as Address)?;
        self.state.pc += 2;
        Ok(ins)
    }

    pub fn decode(&mut self, memory: &mut dyn Addressable) -> Result<Instruction, Error> {
        let ins = self.read_instruction_word(memory)?;

        match ins {
            0x4E71 => Ok(Instruction::NOP),
            _ => panic!("instruction not yet supported: {:#04X}", ins),
        }
    }
}

We can then make a function to execute the decoded instruction

impl M68k {
    pub fn execute(&mut self, memory: &mut dyn Addressable, instruction: Instruction) -> Result<(), Error> {
        match instruction {
            Instruction::NOP => {
                // Do Nothing
                Ok(())
            },
            _ => panic!("Instruction not implemented: {:?}", instruction),
        }
    }
}

And finally a function to wrap these stages together into a single step. For debugging purposes we'll print out what instruction we're about to execute after decoding but before executing.

impl M68k {
    pub fn step(&mut self, memory: &mut dyn Addressable) -> Result<(), Error> {
        match self.state.status {
            M68kStatus::Init => self.init(memory),
            M68kStatus::Stopped => Err(Error::new("cpu has stopped")),
            M68kStatus::Running => {
                let addr = self.state.pc;
                let instruction = self.decode(memory)?;
                println!("{:08x}: {:?}", addr, instruction);
                self.execute(memory, instruction)?;
                Ok(())
            },
        }
    }
}

At this point we have enough pieces to loop over a series of NOP instructions. Our main function looks like the following

const ROM: &[u16] = &[
    0x0010, 0x0000,     // Initial stack address is at 0x00100000
    0x0000, 0x0008,     // Initial PC address is at 0x8, which is the word
                        //  that follows this

    0x4e71,             // 4 NOP instructions
    0x4e71,             
    0x4e71,
    0x4e71,             

    0x4e72, 0x2700      // The STOP #0x2700 instruction, which would normally
                        //  stop the CPU but it's unsupported, so it will
                        //  cause a panic!()
];

let mut cpu = M68k::new();
let mut memory = MemoryBlock::from_u16(ROM);
loop {
    cpu.step(&mut memory).unwrap();
}

Our output should look something like this:

00000008: NOP
0000000a: NOP
0000000c: NOP
0000000e: NOP
thread 'main' panicked at 'instruction not yet supported: 0x4E72', src/main.rs:184:18

Adding Instructions

Since the 68000 has a reasonably orthogonal instruction set, we can break down the opcode word into sub-components, and build up instructions by separately interpreting those sub-components, rather than having a match arm for each of the 65536 combinations. There is a really helpful chart by GoldenCrystal which shows the full breakdown of opcodes for the 68000. We can look at the first 4 bits of the instruction word to separate it into 16 broad categories of instruction, and then further break it down from there. The full code can be seen here

We can extend our Instruction type to contain more instructions, including the addressing modes that the 68000 supports. A MOVE instruction for example can move data to or from a data or address register, or to memory indirectly using an address in an address register (optionally pre-decrementing or post-incrementing the address), or indirectly using an offset added to an address register, as well as a few others. Since these different addressing modes are available for most instructions, we can wrap them up into a Target type, and use that as the arguments of instructions in the Instruction type. For example, the instruction addil #0x12345678, %d0 would be represented as Instruction::ADD(Target::Immediate(0x12345678), Target::DirectDReg(0), Size::Long). We will maintain the same operand order as the 68k assembly language so data will move from the left hand operand (0x12345678) to the right hand operand (%d0).

pub type Register = u8;

#[derive(Copy, Clone, Debug, PartialEq)]
pub enum Size {
    Byte,
    Word,
    Long,
}

#[derive(Copy, Clone, Debug, PartialEq)]
pub enum Condition {
    NotEqual,
    Equal,
    ...
}

#[derive(Copy, Clone, Debug, PartialEq)]
pub enum Target {
    Immediate(u32),                     // A u32 literal
    DirectDReg(Register),               // Contents of a data register
    DirectAReg(Register),               // Contents of an address register
    IndirectAReg(Register),             // Contents of a memory location given by an address register
    IndirectARegInc(Register),          // Same as IndirectAReg but increment the address register
                                        // by the size of the operation *after* reading memory
    IndirectARegDec(Register),          // Same as IndirectAReg but decrement the address register
                                        // by the size of the operation *before* reading memory
    IndirectARegOffset(Register, i32),  // Contents of memory given by an address register plus the
                                        // signed offset (address register will *not* be modified)
    IndirectPCOffset(i32),              // Same as IndirectARegOffset but using the PC register
    IndirectMemory(u32),                // Contents of memory location given by literal u32 value
}

#[derive(Clone, Debug, PartialEq)]
pub enum Instruction {
    ADD(Target, Target, Size),          // Addition
    ADDA(Target, Register, Size),       // Adding to an address register
                                        //  doesn't affect flags

    Bcc(Condition, i32),                // Branch Conditionally
    BRA(i32),                           // Branch to PC + offset
    BSR(i32),                           // Branch to Subroutine
                                        //  (Push PC; PC = PC + offset)

    JMP(Target),                        // Set the PC to the given value
    JSR(Target),                        // Push PC to the stack and then
                                        //  set PC to the given value

    LEA(Target, Register),              // Load effective address into
                                        //  address register

    MOVE(Target, Target, Size),
    MOVEA(Target, Register, Size),
    NOP,

    RTS,                                // Return from subroutine (Pop PC)
    STOP(u16),                          // Load word into SR register and
                                        //  stop until an interrupt

    SUB(Target, Target, Size),          // Subtraction
    SUBA(Target, Register, Size),
    ...
}

This is just an example of the instruction type definitions. The full code can be found here. Note: it's possible to express more combinations with these instruction types than there are legal instruction combinations. Some combinations are illegal on the 68000 but allowed on the 68020 and up, which the emulator supports using an enum to represent the different 68000 model numbers. Most of the error checking and version difference checking is done during the decode phase (and an illegal instruction exception raised if necessary). Some instructions have special behaviours, so they've been given their own variant in the enum, like ADDA to add a value to an address register which doesn't affect the condition codes the way the ADD instruction does, or CMPA to compare an address register which sign extends byte or word operands to long words before comparing them.

Abstracting Time

In the case of a real CPU, the clock signal is what drives the CPU forward, but it's not the only device in the system that is driven by a clock. I/O devices, such as the MC68681 serial port controller also take an input clock which affects their state over time. In the case of the serial port controller, it has a timer which needs to count down periodically. We're going to need some way of running code for different devices based on the clock, so lets use another trait for this.

pub type Clock = u64;
pub type ClockElapsed = u64;

trait Steppable {
    fn step(&mut self, memory: &mut dyn Addressable, clock: Clock) -> Result<ClockElapsed, Error>;
}

Now we can call the step() method each cycle and pass it the current clock value. It will return the number of clock ticks that should elapse before the next call to that device's step() method.

In the case of Computie, the CPU runs at 10 MHz, but the serial port controller runs on a separate clock at 3.6864 MHz. In order to handle these different clock speeds, we can arbitrarily decide that our Clock value will be the number of nanoseconds from the start of the simulation, and ClockElapsed will be a difference in nanoseconds from the start of the start of the step. We use a u64 here so that we can keep track of simulation time in nanoseconds for approximately 584 which should be enough. Keeping track of the time will allow us to later limit how much time passes (either speeding up or slowing down the execution relative to real-time). With this, we can get a somewhat accurate count when simulating the timer in the serial controller chip. That said, we won't worry about simulating CPU execution times, which varies quite a bit based on each instruction and it's operands, so can add a lot of complexity.

Some I/O

It's time to implement the MC68681 serial port controller. Since it's both an Addressable device and a Steppable device, we'll need to implement both traits. The registers we'll need to support first are the serial port channel A registers. REG_TBA_WR is the transmit write buffer, which will output the character written to it over serial channel A. We can just print the character written to this register to the screen for now. We also need to implement the REG_SRA_RD register, which is a status register. Bits 3 and 2 of the status register indicate if the transmit buffer for channel A is ready and empty. The software for Computie checks the status register before writing data because the real MC68681 can't transmit fast enough to avoid a buffer overrun, so as long as we return a value with those bits set, the software will write characters to the REG_TBA_WR register. This example also shows the REG_RBA_RD register for reading serial data in, but setting the port_a_input field is not shown for simplicity. The full code has be viewed here

// Register Addresses (relative to mapped address)
const REG_SRA_RD: Address = 0x03;       // Ch A Status Register
const REG_TBA_WR: Address = 0x07;       // Ch A Byte to Transmit
const REG_RBA_RD: Address = 0x07;       // Ch A Received Byte
const REG_ISR_RD: Address = 0x0B;       // Interrupt Status Register

// Status Register Bits (SRA/SRB)
const SR_TX_EMPTY: u8  = 0x08;
const SR_TX_READY: u8  = 0x04;
const SR_RX_FULL: u8   = 0x02;
const SR_RX_READY: u8  = 0x01;

// Interrupt Status/Mask Bits (ISR/IVR)
const ISR_TIMER_CHANGE: u8 = 0x08;

struct MC68681 {
    pub port_a_status: u8,
    pub port_a_input: u8,

    pub is_timing: bool,
    pub timer: u16,
    pub int_status: u8,
}

impl Addressable for MC68681 {
    fn len(&self) -> usize {
        0x10
    }

    fn read(&mut self, addr: Address, data: &mut [u8]) -> Result<(), Error> {
        match addr {
            REG_SRA_RD => {
                data[0] = SR_TX_READY | SR_TX_EMPTY;
            },
            REG_RBA_RD => {
                data[0] = self.port_a_input;
                self.port_a_status &= !SR_RX_READY;
            },
            REG_ISR_RD => {
                data[0] = self.int_status;
            },
            _ => { },
        }
        debug!("{}: read from {:0x} of {:0x}", DEV_NAME, addr, data[0]);
        Ok(())
    }

    fn write(&mut self, addr: Address, data: &[u8]) -> Result<(), Error> {
        debug!("{}: writing {:0x} to {:0x}", DEV_NAME, data[0], addr);
        match addr {
            REG_TBA_WR => {
                // Print the character
                println!("{}", data[0] as char);
            },
            _ => { },
        }
        Ok(())
    }
}

impl Steppable for MC68681 {
    fn step(&mut self, memory: &mut dyn Addressable, clock: Clock) -> Result<ClockElapsed, Error> {
        if self.is_timing {
            // Count down, wrapping around from 0 to 0xFFFF
            self.timer = self.timer.wrapping_sub(1);

            if self.timer == 0 {
                // Set the interrupt flag
                self.int_status |= ISR_TIMER_CHANGE;
            }
        }

        // Delay for the number of nanoseconds of our 3.6864 MHz clock
        Ok(1_000_000_000 / 3_686_400)
    }
}

This implementation is enough to print the Welcome to the 68k Monitor! message that the monitor software prints at boot, but we can't accept input yet. Since Computie is meant to be connected via TTY, we could open a new pseudoterminal on the host computer (Linux in this case), and then connect to that pseudoterminal using miniterm like we would with the real Computie. This will also free the console window for debugging and log messages.

I originally implemented this using non-blocking input to check for a new character received on the machine-end of the pseudoterm, and then storing it in a single byte in the MC68681 object, but I later changed this to use a separate thread for polling, and mpsc channels to communicate with the simulation thread. It's a bit too much code to include here but you can see the full tty implementation here

Box It Up

We have 3 different devices objects at this point: the CPU (M68k), the memory (MemoryBlock), and the serial port controller (MC68681). The CPU implements the Steppable trait, the memory implements the Addressable trait, and the controller implements both. That last one is a problem because we can't have two mutable references to both the Addressable and Steppable trait objects of the controller at the same time, and rust doesn't yet support trait objects implementing multiple traits under these conditions. Generics won't work either because we need to store the Addressables in some kind of list, and items in a list must have the same type, so we need some other way of getting a reference to one of the trait objects only when we need to use it. If we weren't so adamant about making this flexible and configurable, we could possibly keep this simpler, but alas, we are that adamant. So lets introduce another trait, and some reference counted boxes!

pub trait Transmutable {
    fn as_steppable(&mut self) -> Option<&mut dyn Steppable> {
        None
    }

    fn as_addressable(&mut self) -> Option<&mut dyn Addressable> {
        None
    }
}

The Transmutable trait can be implemented for each device object, and it has a method for each of our traits, which returns the trait object reference we need. The default implementations return None to indicate that the trait is not implemented for that device object. Device objects that do implement a given trait can redefine the Transmutable function to return Some(self) instead. That means that any Transmutable trait object can be turned into either an Addressable or a Steppable, if supported by the underlying device object, and we have a way of checking that support at runtime.

We top it all off with TransmutableBox which will allow us to have multiple references to any Transmutable objects, so we can pass them around to build our machine configuration. wrap_transmutable is a helper function to box up the device object during startup. The device objects will be set up once at start up and then not changed until the program terminates, so we're not too concerned about reference cycles.

pub type TransmutableBox = Rc<RefCell<Box<dyn Transmutable>>>;

pub fn wrap_transmutable<T: Transmutable + 'static>(value: T) -> TransmutableBox {
    Rc::new(RefCell::new(Box::new(value)))
}

Using this method does introduce some performance penalties, but they are marginal relative to the decode and execution times of each instruction and we can still achieve pretty decent instruction cycle speeds despite the overhead. If speed was a concern, we might try to avoid the Transmutable trait by also eliminating our Steppable trait and using a static machine configuration where the device object types are known at compile time and the top level function for each machine configuration would drive the simulation forward. This latter style architecture is used by many other emulators, such as the ClockSignal (CLK) emulator (which has a very well organized and easy to read codebase if you're looking for another emulator project to learn from). The downside is that each machine requires more machine-specific top level code to tie the pieces together.

If anyone knows of a better way to organize this so that we can get the best of both worlds, speed and the ability to abstract away all the devices without tight coupling, I'd love to heard about it. I'm always looking for new ways to design things in Rust.

An Addressable of Addressables (The Data Bus)

With this new trait, all of our devices look the same regardless of the combinations of traits they implement. We can store them all in the same list, as well as pass them to other objects to be turned into their respective trait references when needed. We can now implement a Bus to hold our Addressables so that we can access them all through the same address space

pub struct Block {
    pub base: Address,
    pub length: usize,
    pub dev: TransmutableBox,
}

pub struct Bus {
    pub blocks: Vec<Block>,
}

impl Bus {
    /// Insert the devices in the correct address order
    pub fn insert(&mut self, base: Address, length: usize, dev: TransmutableBox) {
        let block = Block { base, length, dev };
        let i = self.blocks.iter().position(|cur| cur.base > block.base).unwrap_or(self.blocks.len());
        self.blocks.insert(i, block);
    }

    /// Find the device that's mapped to the given address range, and return
    /// the device, as well as the given address minus the device's base address
    pub fn get_device_at(&self, addr: Address, count: usize) -> Result<(TransmutableBox, Address), Error> {
        for block in &self.blocks {
            if addr >= block.base && addr < (block.base + block.length as Address) {
                let relative_addr = addr - block.base;
                if relative_addr as usize + count <= block.length {
                    return Ok((block.dev.clone(), relative_addr));
                } else {
                    return Err(Error::new(&format!("Error reading address {:#010x}", addr)));
                }
            }
        }
        return Err(Error::new(&format!("No segment found at {:#010x}", addr)));
    }
}

impl Addressable for Bus {
    fn len(&self) -> usize {
        let block = &self.blocks[self.blocks.len() - 1];
        (block.base as usize) + block.length
    }

    fn read(&mut self, addr: Address, data: &mut [u8]) -> Result<(), Error> {
        let (dev, relative_addr) = self.get_device_at(addr, data.len())?;
        let result = dev.borrow_mut().as_addressable().unwrap().read(relative_addr, data);
        result
    }

    fn write(&mut self, addr: Address, data: &[u8]) -> Result<(), Error> {
        let (dev, relative_addr) = self.get_device_at(addr, data.len())?;
        let result = dev.borrow_mut().as_addressable().unwrap().write(relative_addr, data);
        result
    }
}

Our Bus can act like an Addressable, so anything that's mapped to an address on the bus can be accessed through the same &mut Addressable allowing us to pass it to the CPU's .step() method.

A Happy Little System

The last piece we need is a way to call all of our .step() methods, but only the top level loop needs to be able to call them, so we don't need to package them up the same as Addressable. We do, however, need to keep track of the passage of time. We can make a System to hold our pieces, which can also hold our event tracking code, and our Bus.

pub struct NextStep {
    pub next_clock: Clock,
    pub device: TransmutableBox,
}

struct System {
    pub clock: Clock,
    pub bus: Bus,
    pub event_queue: Vec<NextStep>,
}

impl System {
    /// Insert into the queue such that the last device is the next step to execute
    fn queue_device(&mut self, device_step: NextStep) {
        for i in (0..self.event_queue.len()).rev() {
            if self.event_queue[i].next_clock > device_step.next_clock {
                self.event_queue.insert(i + 1, device_step);
                return;
            }
        }
        self.event_queue.insert(0, device_step);
    }

    /// Add a device to the system, and if it's `Steppable` then queue it
    pub fn add_device(&mut self, device: TransmutableBox) -> Result<(), Error> {
        if device.borrow_mut().as_steppable().is_some() {
            self.queue_device(NextStep::new(device));
        }
        Ok(())
    }

    /// Add a device to the system while also mapping it to the given address on the bus
    pub fn add_addressable_device(&mut self, addr: Address, device: TransmutableBox) -> Result<(), Error> {
        let length = device.borrow_mut().as_addressable().unwrap().len();
        self.bus.insert(addr, length, device.clone());
        self.add_device(device)
    }

    /// Execute one step function from the queue
    pub fn step(&mut self) -> Result<(), Error> {
        // Remove the last item in the queue which is the next step to execute
        let mut next_device = self.event_queue.pop().unwrap();

        self.clock = next_device.next_clock;
        let memory = self.bus.as_addressable().unwrap();

        // Execute the step function
        let diff = next_device.device.borrow_mut().as_steppable().unwrap().step(memory, self.clock)?;
        // Adjust the device's next scheduled step
        next_device.next_clock = self.clock + diff;

        // Re-insert into the queue in order of next event
        self.queue_device(next_device);
        Ok(())
    }

    /// Run step functions until the system clock has changed by the given time in nanoseconds
    pub fn run_for(&mut self, elapsed: ClockElapsed) -> Result<(), Error> {
        let target = self.clock + elapsed;

        while self.clock < target {
            self.step()?;
        }
        Ok(())
    }
}

In later iterations, I pass the whole immutable reference to System into the step functions, and put Bus inside a RefCell which can then be accessed if needed by the .step() function for the device object. I've also added an interrupt controller to System which can be used for hardware interrupts from devices (MC68681 in the case of Computie).

Tying It All Together

Finally we can tie all our pieces together and set them in motion.

fn main() -> Result<(), Error> {
    let mut system = System::new();

    // Create our Flash memory, pre-loaded with our Computie monitor firmware
    let rom = MemoryBlock::load("binaries/computie/monitor.bin")?;
    system.add_addressable_device(0x00000000, wrap_transmutable(rom))?;

    // Create a RAM segment at the 1MB address
    let mut ram = MemoryBlock::new(vec![0; 0x00100000]);
    system.add_addressable_device(0x00100000, wrap_transmutable(ram))?;

    let mut serial = MC68681::new();
    // Open a new PTY and launch a terminal emulator that in turn
    // launches miniterm and connects to the other end of the PTY
    launch_terminal_emulator(serial.port_a.connect(Box::new(SimplePty::open()?))?);
    system.add_addressable_device(0x00700000, wrap_transmutable(serial))?;

    let mut cpu = M68k::new(M68kType::MC68010);
    system.add_device(wrap_transmutable(cpu))?;

    // Run forever
    system.run_for(u64::MAX)?;
    Ok(())
}

pub fn launch_terminal_emulator(name: String) {
    use std::thread;
    use std::time::Duration;
    use std::process::Command;

    Command::new("x-terminal-emulator").arg("-e").arg(&format!("pyserial-miniterm {}", name)).spawn().unwrap();
    thread::sleep(Duration::from_secs(1));
}

We can now boot our monitor firmware and once interrupts are added, we can even boot the OS!

alt text

Now What

After only a couple weeks I was able to get all of the Computie software running, including the OS, albeit with a buggy and incomplete 68000 implementation. That didn't seem too hard. What else can I get running inside this emulator. How about I try emulating the Sega Genesis! How hard can it be? Well... after a few months it's damn near broken me, but I'm determined to complete it eventually. I've also taken detours into emulating the original Macintosh (simpler than the Genesis so easier to debug the 68k implementation with) and the TRS-80 (in order to test the Z80 implementation I made to eventually use in the Genesis). I'm hoping to write more about those other efforts, once I get them somewhat working. Until then, happy emulating!

Projects

Posts