Sunday, February 27, 2011

Porting bsnes to C#: Part Two

As I explained in my last post, I recently worked on a project, SnesBox, whose goal was to port the bsnes Super Nintendo (SNES) emulator so I could play emulated games on an Xbox 360.  It didn't succeed because of performance reasons, but I learned more about the differences between C++ and C# in the process.  This post will detail some of the challenges I had to overcome to complete this port.  All bsnes code samples I include are from bsnes 0.72.

Type Safety

C# is a type safe language, meaning it has type enforcement for objects.  This requires reexamination of the operation taking place whenever a type in C++ is implicitly cast to another type.  For instance, in this function, several implicit casts are taking place:

void CPUcore::op_ror_imm_b() 
{
 last_cycle();
 op_io_irq();
 bool carry = regs.p.c;
 regs.p.c = (regs.a.l & 0x01);
 regs.a.l = (carry << 7) | (regs.a.l >> 1);
 regs.p.n = (regs.a.l & 0x80);
 regs.p.z = (regs.a.l == 0);
}

The variable carry is a boolean, which is bit-shifted as an integer.  The result of the bitwise-or operation is then assigned to a byte value, regs.a.l.  Additionally, regs.p.c and regs.p.n are both boolean values which take assignments from integer values produced by the bitwise-and operations.  C# is type safe, so these sorts of cases must be handled using explicit type casts and static methods in System.Convert.

public void op_ror_imm_b(CPUCoreOpArgument args)
{
    last_cycle();
    op_io_irq();
    bool carry = regs.p.c;
    regs.p.c = Convert.ToBoolean(regs.a.l & 0x01);
    regs.a.l = (byte)((Convert.ToInt32(carry) << 7) | (regs.a.l >> 1));
    regs.p.n = Convert.ToBoolean(regs.a.l & 0x80);
    regs.p.z = (regs.a.l == 0);
}

The .NET Framework also contains the BitConverter class, which can be used to convert from byte arrays to primitive types, and vice-versa.  This is demonstrated in the following method.

private void echo_write(int ch)
{
 if (!Convert.ToBoolean(m.t_echo_enabled & 0x20))
 {
  Array.Copy(BitConverter.GetBytes((ushort)m.t_echo_out[ch]), 0, m.ram, m.t_echo_ptr + ch * 2, 2);
 }
 m.t_echo_out[ch] = 0;
}

Here, the lower two bytes in the integer value indexed by ch in the integer array m.t_echo_out are converted to an array of bytes, then copied into the byte array m.ram.

I wanted my code to stay as close as possible to bsnes code, for debugging purposes.  Even if a design was not necessary from a C# perspective, I went with it to prevent any ambiguity or confusion later on.  Type safety is one of the most obvious places this was observed, particularly with boolean types being converted to numerical values, and vice-versa.  Lines such as these:

Input.input.port_set_device(Convert.ToBoolean(0), Configuration.config.controller_port1);
Input.input.port_set_device(Convert.ToBoolean(1), Configuration.config.controller_port2);

could be easily converted to use true and false, but their meaning is preserved as is.

Union Types

Union types are still lurking out there.  To be fair to bsnes, whose objects duplicate behavior of hardware registers, a union makes sense for accessing different chunks of register bits.  That being said, coming across unions in bsnes was probably my biggest, "oh crap," moment.

struct reg24_t 
{
 union 
 {
  uint32 d;
  struct
  { 
   uint16 order_lsb2(w, wh); 
  };
  struct 
  { 
   uint8 order_lsb4(l, h, b, bh); 
  };
 };

...

I thought to myself, "C# doesn't support something like Feature X, it must not support unions, right?"  Wrong...sort of.  Enter: the FieldOffset attribute.

[StructLayout(LayoutKind.Explicit)]
public class Reg24
{
    [FieldOffset(0)]
    public uint d;

    [FieldOffset(0)]
    public ushort w;
    [FieldOffset(2)]
    public ushort wh;

    [FieldOffset(0)]
    public byte l;
    [FieldOffset(1)]
    public byte h;
    [FieldOffset(2)]
    public byte b;
    [FieldOffset(3)]
    public byte bh;

...

In this piece of code (which assumes little endian architecture), the StructLayout attribute with an argument of LayoutKind.Explicit tells the compiler, "I'm going to explicitly lay out the memory for the fields in this object."  LayoutKind.Sequential is the default.

The FieldOffset attribute is applied to fields you want to be part of your "union," where the offset is the number of bytes from the start of memory allocated to the object.  By offsetting the byte variable l to the zeroth byte in the Reg24 class, writing to that field will write to the upper byte in both w and d.  Congratulations, you've created a union and earned the curses of anyone who has to maintain your code.

Array Pointers

In C#, arrays are objects which encapsulate managed memory.  Consequently, there's no way to iterate through the elements of C# arrays using pointer arithmetic.  This also means you can't pass a pointer to an arbitrary location in array memory.  This presents a problem when translating logic such as this:

void Video::draw_cursor(uint16_t color, int x, int y) 
{
 uint16_t *data = (uint16_t*)ppu.output;
 if(ppu.interlace() && ppu.field()) 
 {
  data += 512;
 }

...

if(hires == false) 
 {
  *((uint16_t*)data + vy * 1024 + vx) = pixelcolor;
 } 
 else 
 {
  *((uint16_t*)data + vy * 1024 + vx * 2 + 0) = pixelcolor;
  *((uint16_t*)data + vy * 1024 + vx * 2 + 1) = pixelcolor;
 }

...

Here, the variable data points to the beginning of the array ppu.output.  If a conditional is met, the memory address stored by the pointer is incremented by 512 bytes.  Later on in the method, the memory address is incremented using additional arithmetic to create a temporary address, which is dereferenced to write the pixelcolor value.

The .NET Framework contains a generic struct, ArraySegment, which can be used as a substitute for the functionality of an array pointer.

private void draw_cursor(ushort color, int x, int y)
{
    var data = PPU.ppu.output;
    if (PPU.ppu.interlace() && PPU.ppu.PPUCounter.field())
    {
        data = new ArraySegment<ushort>(data.Array, data.Offset + 512, data.Count - 512);
    }

...

    if (hires == false)
    {
        data.Array[data.Offset + (vy * 1024 + vx)] = pixelcolor;
    }
    else
    {
        data.Array[data.Offset + (vy * 1024 + vx * 2 + 0)] = pixelcolor;
        data.Array[data.Offset + (vy * 1024 + vx * 2 + 1)] = pixelcolor;
    }

...

ArraySegments are created using an array, the offset into the arrray, and the number of elements in the array past the offset point.

Use of ArraySegment is one of the areas of the code I'm disappointed with.  I would revisit it if I were to continue work on the project.  The struct is painfully simple; there isn't even an indexer to access array elements directly.  Additionally, ArraySegment is immutable: if you wish to change the offset, you have to create a new struct.  These shortcomings could be easily solved by writing my own ArraySegment-style struct.

Reference Type Primitives

In C#, primitives are actually represented by structs in the System namespace, making them value types.  You can pass a value type by reference in C#, using the ref keyword, but there's no language feature which lets you explicitly declare a variable to be of reference type.  This became a problem when I came across the following struct in bsnes code:

struct regs_t 
{
 uint16_t pc;
 uint8_t r[4], &a, &x, &y, &sp;
 regya_t ya;
 flag_t p;
 regs_t() : a(r[0]), x(r[1]), y(r[2]), sp(r[3]), ya(r[2], r[0]) {}
};

The variables axy, and sp are references to the individual elements of the array r.  There were several ways I could have approached solving this.  The first was to write a class wrapper for byte.  Classes are passed by reference in C#, so any wrapper around a byte value would pull the member byte with it when passed around.  But with this approach, the array r would also have to be an array of byte wrappers.  Arrays are instances of a class in C#, so they're passed by reference.  Thus, an array of byte wrappers would be redundant.

Instead, since this was an isolated occurrence of primitive type references, I opted to solve the problem by using an ArraySegment struct as an offset to each of the elements of r.  

public class Regs
{
    public ushort pc;
    public byte[] r = new byte[4];
    public ArraySegment<byte> a, x, y, sp;
    public RegYA ya;
    public Flag p = new Flag();

    public Regs()
    {
        a = new ArraySegment<byte>(r, 0, 1);
        x = new ArraySegment<byte>(r, 1, 1);
        y = new ArraySegment<byte>(r, 2, 1);
        sp = new ArraySegment<byte>(r, 3, 1);
        ya = new RegYA(new ArraySegment<byte>(r, 2, 1), new ArraySegment<byte>(r, 0, 1));
    }
}

The ArraySegments themselves are passed by value, but the array they wrap is passed by reference.  Therefore, the value each of the variables index will remain the same wherever one of the structs is referenced.

Template Metaprogramming

C++ and C# templates are very different beasts.  I still have much learning to do to comprehend the differences between the two.  In C#, templates are called generics, and are a simplified version of C++ templates.  Most importantly for my port, C# generics do not allow non-type parameters.  In C++, a compile time constant can be specified as a template argument, allowing for methods in bsnes like this:

template<int bits> inline unsigned uclip(const unsigned x) 
{
 enum { m = (1U << bits) - 1 };
 return (x & m);
}

where uclip can have the desired number of bits generated at compile time by being called as

uclip<2>(data + 1);

Since C# does not allow generics to be used like this, functions such as these could be changed to:

uclip(2, data + 1);

However, this does not generate the same type of function call as C++, where the number 2 has been compiled into the function itself.

For some template methods, I had to get a little more creative, such as initializing the opcode tables for the processor cores.  In bsnes, non-type parameters were used to initialize the table's function calls at compile time:

op[0x34] = &SMPcore::op_read_a_dpx<&SMPcore::op_and>;
op[0x35] = &SMPcore::op_read_a_addrr<&SMPcore::op_and, X>;
op[0x36] = &SMPcore::op_read_a_addrr<&SMPcore::op_and, Y>;
op[0x37] = &SMPcore::op_read_a_idpy<&SMPcore::op_and>;
op[0x38] = &SMPcore::op_read_dp_const<&SMPcore::op_and>;
op[0x39] = &SMPcore::op_read_ix_iy<&SMPcore::op_and>;

Since all arguments were generated at compile time using template metaprogramming, the signature of the function pointers is void function(void), allowing the table to be initialized uniformly.

In C#, I pass my method arguments manually.  Since the delegate signatures of the opcode table entries have to match each other, it is necessary for each method to match a common signature.  I created an argument class, where I could initialize the parameters needed for a particular method:

opcode_table[0x34] = new SMPCoreOperation(op_read_a_dpx, new SMPCoreOpArgument() { op_func = op_and });
opcode_table[0x35] = new SMPCoreOperation(op_read_a_addrr, new SMPCoreOpArgument() { op_func = op_and, i = (int)OpCode.X });
opcode_table[0x36] = new SMPCoreOperation(op_read_a_addrr, new SMPCoreOpArgument() { op_func = op_and, i = (int)OpCode.Y });
opcode_table[0x37] = new SMPCoreOperation(op_read_a_idpy, new SMPCoreOpArgument() { op_func = op_and });
opcode_table[0x38] = new SMPCoreOperation(op_read_dp_const, new SMPCoreOpArgument() { op_func = op_and });
opcode_table[0x39] = new SMPCoreOperation(op_read_ix_iy, new SMPCoreOpArgument() { op_func = op_and });

The beauty of template metaprogramming is definitely lost in the translation.  However, this lack of functionality is most frustrating when porting the variant data types, such as 2-bit, 3-bit, and 17-bit unsigned integers, found in SNES hardware.  bsnes contains an elegant, although somewhat cryptic, template class to generate these types:

template<unsigned bits> class uint_t 
{
private:
 enum { bytes = (bits + 7) >> 3 };
 typedef typename static_if<
  sizeof(int) >= bytes,
  unsigned int,
  typename static_if<
  sizeof(long) >= bytes,
  unsigned long,
  typename static_if<
  sizeof(long long) >= bytes,
  unsigned long long,
  void
  >::type
  >::type
 >::type T;
 static_assert(!std::is_same<T, void>::value, "");
 T data;

...

and a new variant data type can be defined with the line

typedef uint_t<2> uint2;

As a TDD developer and all-around fan of simple development, I have quite a pet peeve for duplicated code.  It increases the number of things I have to keep in my brain at any given time, and when I make a change to duplicated code, I have to change it everywhere the code has been duplicated.  It therefore came as a huge disappointment when I had to copy/paste the entire class of every variant data type I wanted to generate:

public struct uint2
{
    private uint data;
    private const int bits = 2;

...

public struct uint9
{
    private uint data;
    private const int bits = 9;

...

Gross.  If someone knows a way around this, please come forward, as the solution eludes me.  After this experience, non-type parameter generic methods are a feature I anxiously await in a future version of C#.

Overloaded Operators

There was some confusion on my part when I first started using overloaded operators in C#.  I had never worked on a project that required them, so I expected them to behave like overloaded operators in C++.

...
inline operator T() const { return data; }
inline T operator ++(int) { T r = data; data = uclip<bits>(data + 1); return r; }
inline T operator --(int) { T r = data; data = uclip<bits>(data - 1); return r; }
inline T operator ++() { return data = uclip<bits>(data + 1); }
inline T operator --() { return data = uclip<bits>(data - 1); }
inline T operator  =(const T i) { return data = uclip<bits>(i); }
inline T operator |=(const T i) { return data = uclip<bits>(data  | i); }
inline T operator ^=(const T i) { return data = uclip<bits>(data  ^ i); }
inline T operator &=(const T i) { return data = uclip<bits>(data  & i); }
inline T operator<<=(const T i) { return data = uclip<bits>(data << i); }
inline T operator>>=(const T i) { return data = uclip<bits>(data >> i); }
inline T operator +=(const T i) { return data = uclip<bits>(data  + i); }
inline T operator -=(const T i) { return data = uclip<bits>(data  - i); }
inline T operator *=(const T i) { return data = uclip<bits>(data  * i); }
inline T operator /=(const T i) { return data = uclip<bits>(data  / i); }
inline T operator %=(const T i) { return data = uclip<bits>(data  % i); }

The first operator overloaded in this example, the cast operator, converts the variant struct to a primitive data type.  This can also be accomplished in C#, using the static explicit operator overload:

public static explicit operator uint(uint2 number)
{
    return number.data;
}

Continuing through the list of overloaded operators, the assignment operator cannot be overloaded in C#.  Instead, I created an Assign method.

public uint Assign(uint i)
{
    return data = Bit.uclip(bits, i);
}

At first I was concerned I wouldn't remember to use this function when converting an expression where a uint is assigned to the uint2 class.  However, because of type safety in C#, this situation was avoided by a compile error.

Finally, the overloaded arithmetic and bit-wise operators.  These operators caused me the most confusion, since all the arithmetic operators I overload are actually assignment operators.  In C++, this is addressed by assigning the result of the operation to the internal unsigned integer, data.

In C#, you cannot overload the operator/assignment operations.  For example, you can overload +, but not +=.  I learned that this is because C# handles the += operator for you automatically, once the + operator has been overloaded.  Rather than assign the result of an operation back to the object itself, you return the value only.  C# determines whether the assignment should take place based on the context:

public static uint2 operator +(uint2 number, uint i)
{
    return new uint2(Bit.uclip(bits, number.data + i));
}

The same is true of the pre and post-increment and decrement operators.  Rather than handle both cases, you overload the + and - operators and increment or decrement by 1.  C# handles the order issues automatically.
 
public static uint2 operator ++(uint2 number)
{
    return number + 1;
}

Multiple Inheritance

One of the first things a C++ developer discovers when learning C# is the lack of multiple class inheritance, to prevent the ambiguity that can arise from abuse of this language feature.  This immediately posed a problem for SnesBox, since bsnes uses extensive multiple inheritance, such as in the CPU class:

class CPU : public Processor, public CPUcore, public PPUcounter, public MMIO
{

...

There is no way to get around the lack of inheritance from multiple classes in C#, and no way to fully simulate the behavior of using more than one base class, such as overriding methods and accessing protected members.  Any technique which presumes to do so is, in the end, a trick or a hack.

C# does, however, allow the implementation of multiple interfaces.  Many developers have used this to their advantage in as many different ways when attempting to program multiple inheritance "into" the C# language.  Since I was writing all the SnesBox code myself and could define how the code interacted with itself, using multiple interfaces was an acceptable solution.  The SnesBox implementation of CPU looks like this:

partial class CPU : CPUCore, IPPUCounter, IProcessor, IMMIO
{

...

In the code for SnesBox, Processor still exists as an explicit class:

class Processor
{
    public Thread thread;
    public uint frequency;
    public long clock;

    ...

    public Processor()
    {
        thread = null;
    }
}

However, instead of making any class inherit directly from Processor, I use an IProcessor interface.

interface IProcessor
{
    Processor Processor { get; }
}

The Processor class is never used as an expected base class anywhere in the code.  Instead, any method which would have expected an instance of a Processor instead takes an instance of an IProcessor.  Processor functionality is then accessed through the Processor property in the interface.

For example, in the bsnes C++ code, the method "step" uses the clock and frequency fields it has inherited from the Processor base class.

void CPU::step(unsigned clocks) 
{
 smp.clock -= clocks * (uint64)smp.frequency;
 ppu.clock -= clocks;
 for(unsigned i = 0; i < coprocessors.size(); i++) 
 {
  Processor &chip = *coprocessors[i];
  chip.clock -= clocks * (uint64)chip.frequency;
 }
}

Inside the for loop, each element in the array of coprocessors is accessed as a Processor.  In the SnesBox C# code, the function looks mostly the same:

public void step(uint clocks)
{
    SMP.smp.Processor.clock -= (long)(clocks * (ulong)SMP.smp.Processor.frequency);
    PPU.ppu.Processor.clock -= clocks;
    for (uint i = 0; i < coprocessors.Count; i++)
    {
        IProcessor chip = coprocessors[(int)i];
        chip.Processor.clock -= (long)(clocks * (ulong)chip.Processor.frequency);
    }
}

Each piece of Processor functionality is accessed using the Processor property in IProcessor.  A method can still access an object as an IProcessor, rather than its derived type.  The step function does this inside the for loop.

There is not much "syntactic sugar" to be found in an implementation of multiple inheritance such as this.  Each time the interface-implemented functionality is used, the property which contains the actual object must come before any fields or methods.  CPUCore is the largest and most complicated of all the base classes used by CPU, and it is from this class that CPU inherits a significant amount of its functionality.  Because of the relative awkwardness of using interfaces as a source of multiple inheritance, I chose CPUCore as my one explicit base class.

Coroutines and Fibers

Much can be said about coroutines, since it was, ultimately, the issue that defeated this project.  In bsnes, each emulated processor is run on a fiber.  Fibers, and thus, the processors in bsnes, operate using cooperative multitasking.  This means that the fibers themselves control when they stop work, and when another fiber starts up after it.  This is where fibers differ from threads, which are scheduled by the OS.  For more on the subject from a gaming perspective, I suggest Ben Carter's post on #AltDevBlogADay.

As I mentioned in my last post, I got around this issue for debugging purposes by using deprecated methods in .NET threads which allow them to cooperatively multitask.  Since the feature is deprecated and the threads were heavyweight, the performance was abysmal.  This lead me to explore different means of preserving a stack at an arbitrary point during execution and resuming execution in a place where it last left off.

A coroutine is a function that has multiple entry and exit points.  The stack is effectively preserved each time a coroutine returns, allowing the coroutine to resume execution at the point it returned from when it is called again.  Coroutines are possible in C#, using the yield statement.  For a fantastic use of coroutines in C#, see Rob Eisenberg's presentation from MIX 2010, Build Your Own MVVM Framework (don't worry, the title makes it sound more intimidating than it actually is).  If you're interested in some of the finer details of what's happening under the hood of a C# coroutine, check out Jeremy Likness's blog on sequential asynchronous workflows.


When the yield keyword is used in a method, the C# compiler generates a class behind the scenes to implement an iterator block for that method.  This class is a state machine which allows iteration through the states in the method.  Each yield statement signifies that a new state in the iterator has been reached, and execution returns to the method doing the iterating.  The state inside the iterator is preserved until the next iteration.

For instance, in the threaded version of SnesBox, the CPU may yield its execution to the SMP at any time by calling:

public void synchronize_smp()
{
    if (SMP.smp.Processor.clock < 0)
    {
        Libco.Switch(SMP.smp.Processor.thread);
    }
}

In the Switch function, the current thread (the CPU) is paused, and the SMP thread is resumed.  In this way, processors hand off execution to one another thousands of time each frame.

Using coroutines, I can preserve state at the same point using a yield statement:

public IEnumerable synchronize_smp()
{
    if (SMP.smp.Processor.clock < 0)
    {
        yield return SMP.smp.Processor.thread;
    }
}

Here, the enumerator from the SMP iterator block is returned as the result of the iteration.  For nested coroutines to work properly, a method must become an enumerable method if it nests another enumerable method.  For instance, a method which calls synchronize_smp must also yield return any enumerable results of synchronize_smp, and so on:

private IEnumerable scanline()
{
    foreach (var e in synchronize_smp())
    {
        yield return e;
    };
    foreach (var e in synchronize_ppu())
    {
        yield return e;
    };
    foreach (var e in synchronize_coprocessor())
    {
        yield return e;
    };
    foreach (var e in System.system.scanline())
    {
        yield return e;
    };

...

Much like const-correctness in C++, nested coroutines in C# are "catching," and spread up the callstack.  At the top of the enumerable processors, the scheduler iterates through all returned enumerators until an exit code is received, indicating it is time to draw a frame.

I created a test project that had a class which used nested coroutines, four layers deep, and a corresponding class that did the same thing, but used .NET threads to preserve the stack.  I performed 10,000,000 switches back and forth and got the following timings:


Threading Start
Time: 00:01:52.0351499
Yielding Start
Time: 00:00:07.8278730

Coroutines produced a 14x speed increase.  This made me very optimistic about doing the same thing to SnesBox, so I began work in a separate branch.

I was quite disappointed when the version of SnesBox using coroutines ran more slowly than the version with threads.  I didn't investigate the cause of the performance loss thoroughly.  Since the processor switching can occur at nearly any place during execution of bsnes, it was necessary to place an iteration loop around almost every function in the codebase.  I suspect that doing an iteration at every level in the callstack produces so much overhead that the benefits of moving from heavyweight threads are lost.

Conclusions

As I asserted in my previous post, even though the project was a failure, I was glad I gave it a try.  It was exciting to try to make a contribution to the emulation community, and to learn a little more about what's going on inside the machine when I start up a cartridge.  Mostly, it was a fun experiment in porting code from C++ to C#.

I'll be keeping an eye open for any significant changes in bsnes related to fibers.  I also hold misplaced hope that a future release of the .NET Framework may contain support for lightweight threads.

Sunday, February 6, 2011

Porting bsnes to C#: Part One

Browse the code repository for this post here.

An Idea and a Motivation

As I mentioned in my previous post, I'm a lifelong Super Nintendo (SNES) fan, and I miss the simple fun of being able to sit on my couch and play my old games.  There are excellent emulators available, but the experience of going through the PC isn't the same as powering up a good ol' fashioned console in the living room.

With that in mind, I began an experiment that I thought would be fun, in the spirit of bringing bsnes to a larger audience.  Since I'm interested in XNA game development, I wanted to port bsnes to run using XNA.  It was my hope to have an SNES emulator that would be playable on the Xbox, and perhaps even an nth generation Windows Phone.  It's easy to imagine all the fun things that could be accomplished using these devices, such as network play for multi-player games.  Neither of these platforms allow deployment of indie games written in native code, so to achieve this goal, I had to port the entire bsnes codebase from C++ to C#.

Lost in Translation

Although the finished product contains well over 10,000 LOC, this task sounds more daunting than it actually was.  Development was eased due to the clean, well-designed code that byuu and his peers have written.  I approached the port from the top-down, fleshing out the member function signatures and variables before implementing each function itself.  In the process, I learned quite a bit about the differences in C++ and C#, as well as a few things I didn't know about C++.  I'll discuss these in more detail in an upcoming post.

After a month's work translating functions and classes, I got my C# codebase - which I'm calling SnesBox (yes, I missed my calling in marketing) - to the point where it could be run without immediately throwing a System.NotImplementedException.  At this point, it was necessary to compare execution of bsnes to SnesBox, preferably in an IDE.  This was the most effective way to see where I had made the inevitable errors in such a large port.  To accomplish this, I installed Ubuntu in VirtualBox on my Windows machine, where I could debug bsnes in NetBeans, side-by-side with SnesBox in Visual Studio 2010.

bsnes already implements custom state serialization, which I had originally opted to forgo in favor of .NET serialization.  I later realized this serialization was my key to the most effective inter-OS, inter-IDE, inter-language debugging tool I could think of: by writing identically serialized states to VirtualBox's shared folders at key points during execution, I could use a full-featured diff tool to track down the points at which the states of bsnes and SnesBox began to diverge.  Usually this meant placing a state file "write" each time the emulated processors would yield to one another, then seeing which variable in the diff contained the offending difference.  By knowing which variable contained the difference, it was usually as simple as seeing where it was used in code and finding which of several "usual suspect" translation errors (bad cast, wrong primitive type, incorrect logic, etc.) had taken place.

After another month's work, I had a running XNA-based bsnes emulator, written in C#.

SnesBox running Super Mario World at 30 FPS on my laptop.
Now, if you don't think this game is the greatest game ever: I will fight you.  That's no lie.
Broken Threads

If I had discussed the idea of a managed solution with a hardware-level emulation developer before I began, I knew I'd immediately be laughed at for naïvely attempting such a computationally intense task with the .NET framework.  And, they would have been correct...but for different reasons than I initially suspected.  Ultimately, my C# emulation does not run at the necessary speed of 60 FPS, for reasons I'll explain.

Early in my porting effort, I encountered libco, a cooperative multithreading library that byuu had developed.  Multithreading is used in bsnes to provide each emulated processor with its own execution context.  When a processor has determined it is time to hand off work to the next processor, its stack is preserved until execution is manually returned to the point where the processor was suspended.  bsnes contains three execution profiles: accuracy, compatibility, and performance.  One of the ways the performance profile runs faster is by eliminating the use of a separate thread for one of the emulated processors.

I knew the port for libco would not be straightforward, since it contained OS-specific fiber routines, some of which were written in machine code; talk about obfuscation!  Unfortunately, the .NET framework does not support fibers at this time.  To complicate things further, the manual threading functions which .NET does contain, Suspend and Resume, have been deprecated.  This left me to use heavyweight threads in a manner that is avoided on Windows and impossible on the Xbox.  My horribly awkward thread switching function makes use of these features, implemented as:


public static void Switch(Thread thread)
{
    var previous = _active;
    _active = thread;
    if (_active.ThreadState == ThreadState.Unstarted)
    {
        _active.Start();
    }
    else
    {
        WaitForCurrentThreadToFinishSuspending();
        _active.Resume();
    }
    previous.Suspend();
}

private static void WaitForCurrentThreadToFinishSuspending()
{
    while (_active.ThreadState != ThreadState.Suspended) { }
}

It was sufficient for debugging purposes, but not for a deliverable.  After running Visual Studio performance analysis on SnesBox, the percentage of execution time spent switching threads was between 81% for the accuracy profile, and 57% for the performance profile.

I did make an attempt to circumvent the threading issue, using an underutilized feature of the C# language: the yield statement.  I'll discuss this in much greater detail in a future post, but the results of the experiment can be found in the "yielded" branch of the SnesBox code repository.  Bottom line: it removes the usage of .NET threads, but generates so much additional code for state preservation in the process that execution speed is further reduced.

Conclusions

I have ported bsnes to C# and can run SNES games (which don't have any on-board co-processors) using the .NET framework and XNA.  Like so many projects that are completed on a developer's whim, I had an absolute blast working on this, even if it did not produce a shippable product.  In the process, I learned a lot about the two programming languages involved, and would feel confident taking on another C++ to C# porting project.  I even uncovered a couple of minor bugs in bsnes code, which I've identified in the repository with TODO comments.  C++ may not care if you overrun an array, but C# certainly throws an exception or two.

I have made the code repository available on Google Code, distributed under the MIT License.  I have been lead to believe this is the most "public domain-y" of the licenses Google Code offers.  Under the project issues, I have listed and prioritized some remaining problems I previously identified.  If anyone can solve the .NET fiber issue, I would be excited to work with them to create something we can all have fun playing.

My thanks to byuu and the other developers working on bsnes: you guys have created an amazing product, and it shows.  Look for my next post concerning the SnesBox effort, where I identify some of the issues I had to overcome in my C++ to C# port, and further explain co-routines as "fibers" in the .NET framework.