As I explained in my last post, I recently worked on a project, SnesBox, whose goal was to port the bsnes Super Nintendo (SNES) emulator so I could play emulated games on an Xbox 360. It didn't succeed because of performance reasons, but I learned more about the differences between C++ and C# in the process. This post will detail some of the challenges I had to overcome to complete this port. All bsnes code samples I include are from bsnes 0.72.
Here, the lower two bytes in the integer value indexed by ch in the integer array m.t_echo_out are converted to an array of bytes, then copied into the byte array m.ram.
I wanted my code to stay as close as possible to bsnes code, for debugging purposes. Even if a design was not necessary from a C# perspective, I went with it to prevent any ambiguity or confusion later on. Type safety is one of the most obvious places this was observed, particularly with boolean types being converted to numerical values, and vice-versa. Lines such as these:
could be easily converted to use true and false, but their meaning is preserved as is.
There was some confusion on my part when I first started using overloaded operators in C#. I had never worked on a project that required them, so I expected them to behave like overloaded operators in C++.
The first operator overloaded in this example, the cast operator, converts the variant struct to a primitive data type. This can also be accomplished in C#, using the static explicit operator overload:
Continuing through the list of overloaded operators, the assignment operator cannot be overloaded in C#. Instead, I created an Assign method.
At first I was concerned I wouldn't remember to use this function when converting an expression where a uint is assigned to the uint2 class. However, because of type safety in C#, this situation was avoided by a compile error.
Finally, the overloaded arithmetic and bit-wise operators. These operators caused me the most confusion, since all the arithmetic operators I overload are actually assignment operators. In C++, this is addressed by assigning the result of the operation to the internal unsigned integer, data.
In C#, you cannot overload the operator/assignment operations. For example, you can overload +, but not +=. I learned that this is because C# handles the += operator for you automatically, once the + operator has been overloaded. Rather than assign the result of an operation back to the object itself, you return the value only. C# determines whether the assignment should take place based on the context:
The same is true of the pre and post-increment and decrement operators. Rather than handle both cases, you overload the + and - operators and increment or decrement by 1. C# handles the order issues automatically.
Multiple Inheritance
One of the first things a C++ developer discovers when learning C# is the lack of multiple class inheritance, to prevent the ambiguity that can arise from abuse of this language feature. This immediately posed a problem for SnesBox, since bsnes uses extensive multiple inheritance, such as in the CPU class:
There is no way to get around the lack of inheritance from multiple classes in C#, and no way to fully simulate the behavior of using more than one base class, such as overriding methods and accessing protected members. Any technique which presumes to do so is, in the end, a trick or a hack.
C# does, however, allow the implementation of multiple interfaces. Many developers have used this to their advantage in as many different ways when attempting to program multiple inheritance "into" the C# language. Since I was writing all the SnesBox code myself and could define how the code interacted with itself, using multiple interfaces was an acceptable solution. The SnesBox implementation of CPU looks like this:
In the code for SnesBox, Processor still exists as an explicit class:
However, instead of making any class inherit directly from Processor, I use an IProcessor interface.
The Processor class is never used as an expected base class anywhere in the code. Instead, any method which would have expected an instance of a Processor instead takes an instance of an IProcessor. Processor functionality is then accessed through the Processor property in the interface.
For example, in the bsnes C++ code, the method "step" uses the clock and frequency fields it has inherited from the Processor base class.
Inside the for loop, each element in the array of coprocessors is accessed as a Processor. In the SnesBox C# code, the function looks mostly the same:
Each piece of Processor functionality is accessed using the Processor property in IProcessor. A method can still access an object as an IProcessor, rather than its derived type. The step function does this inside the for loop.
There is not much "syntactic sugar" to be found in an implementation of multiple inheritance such as this. Each time the interface-implemented functionality is used, the property which contains the actual object must come before any fields or methods. CPUCore is the largest and most complicated of all the base classes used by CPU, and it is from this class that CPU inherits a significant amount of its functionality. Because of the relative awkwardness of using interfaces as a source of multiple inheritance, I chose CPUCore as my one explicit base class.
Coroutines and Fibers
Much can be said about coroutines, since it was, ultimately, the issue that defeated this project. In bsnes, each emulated processor is run on a fiber. Fibers, and thus, the processors in bsnes, operate using cooperative multitasking. This means that the fibers themselves control when they stop work, and when another fiber starts up after it. This is where fibers differ from threads, which are scheduled by the OS. For more on the subject from a gaming perspective, I suggest Ben Carter's post on #AltDevBlogADay.
As I mentioned in my last post, I got around this issue for debugging purposes by using deprecated methods in .NET threads which allow them to cooperatively multitask. Since the feature is deprecated and the threads were heavyweight, the performance was abysmal. This lead me to explore different means of preserving a stack at an arbitrary point during execution and resuming execution in a place where it last left off.
A coroutine is a function that has multiple entry and exit points. The stack is effectively preserved each time a coroutine returns, allowing the coroutine to resume execution at the point it returned from when it is called again. Coroutines are possible in C#, using the yield statement. For a fantastic use of coroutines in C#, see Rob Eisenberg's presentation from MIX 2010, Build Your Own MVVM Framework (don't worry, the title makes it sound more intimidating than it actually is). If you're interested in some of the finer details of what's happening under the hood of a C# coroutine, check out Jeremy Likness's blog on sequential asynchronous workflows.
When the yield keyword is used in a method, the C# compiler generates a class behind the scenes to implement an iterator block for that method. This class is a state machine which allows iteration through the states in the method. Each yield statement signifies that a new state in the iterator has been reached, and execution returns to the method doing the iterating. The state inside the iterator is preserved until the next iteration.
For instance, in the threaded version of SnesBox, the CPU may yield its execution to the SMP at any time by calling:
In the Switch function, the current thread (the CPU) is paused, and the SMP thread is resumed. In this way, processors hand off execution to one another thousands of time each frame.
Using coroutines, I can preserve state at the same point using a yield statement:
Here, the enumerator from the SMP iterator block is returned as the result of the iteration. For nested coroutines to work properly, a method must become an enumerable method if it nests another enumerable method. For instance, a method which calls synchronize_smp must also yield return any enumerable results of synchronize_smp, and so on:
Much like const-correctness in C++, nested coroutines in C# are "catching," and spread up the callstack. At the top of the enumerable processors, the scheduler iterates through all returned enumerators until an exit code is received, indicating it is time to draw a frame.
I created a test project that had a class which used nested coroutines, four layers deep, and a corresponding class that did the same thing, but used .NET threads to preserve the stack. I performed 10,000,000 switches back and forth and got the following timings:
Type Safety
C# is a type safe language, meaning it has type enforcement for objects. This requires reexamination of the operation taking place whenever a type in C++ is implicitly cast to another type. For instance, in this function, several implicit casts are taking place:
The variable carry is a boolean, which is bit-shifted as an integer. The result of the bitwise-or operation is then assigned to a byte value, regs.a.l. Additionally, regs.p.c and regs.p.n are both boolean values which take assignments from integer values produced by the bitwise-and operations. C# is type safe, so these sorts of cases must be handled using explicit type casts and static methods in System.Convert.
The .NET Framework also contains the BitConverter class, which can be used to convert from byte arrays to primitive types, and vice-versa. This is demonstrated in the following method.
void CPUcore::op_ror_imm_b() { last_cycle(); op_io_irq(); bool carry = regs.p.c; regs.p.c = (regs.a.l & 0x01); regs.a.l = (carry << 7) | (regs.a.l >> 1); regs.p.n = (regs.a.l & 0x80); regs.p.z = (regs.a.l == 0); }
The variable carry is a boolean, which is bit-shifted as an integer. The result of the bitwise-or operation is then assigned to a byte value, regs.a.l. Additionally, regs.p.c and regs.p.n are both boolean values which take assignments from integer values produced by the bitwise-and operations. C# is type safe, so these sorts of cases must be handled using explicit type casts and static methods in System.Convert.
public void op_ror_imm_b(CPUCoreOpArgument args) { last_cycle(); op_io_irq(); bool carry = regs.p.c; regs.p.c = Convert.ToBoolean(regs.a.l & 0x01); regs.a.l = (byte)((Convert.ToInt32(carry) << 7) | (regs.a.l >> 1)); regs.p.n = Convert.ToBoolean(regs.a.l & 0x80); regs.p.z = (regs.a.l == 0); }
The .NET Framework also contains the BitConverter class, which can be used to convert from byte arrays to primitive types, and vice-versa. This is demonstrated in the following method.
private void echo_write(int ch) { if (!Convert.ToBoolean(m.t_echo_enabled & 0x20)) { Array.Copy(BitConverter.GetBytes((ushort)m.t_echo_out[ch]), 0, m.ram, m.t_echo_ptr + ch * 2, 2); } m.t_echo_out[ch] = 0; }
Here, the lower two bytes in the integer value indexed by ch in the integer array m.t_echo_out are converted to an array of bytes, then copied into the byte array m.ram.
I wanted my code to stay as close as possible to bsnes code, for debugging purposes. Even if a design was not necessary from a C# perspective, I went with it to prevent any ambiguity or confusion later on. Type safety is one of the most obvious places this was observed, particularly with boolean types being converted to numerical values, and vice-versa. Lines such as these:
Input.input.port_set_device(Convert.ToBoolean(0), Configuration.config.controller_port1); Input.input.port_set_device(Convert.ToBoolean(1), Configuration.config.controller_port2);
could be easily converted to use true and false, but their meaning is preserved as is.
Union Types
Union types are still lurking out there. To be fair to bsnes, whose objects duplicate behavior of hardware registers, a union makes sense for accessing different chunks of register bits. That being said, coming across unions in bsnes was probably my biggest, "oh crap," moment.
I thought to myself, "C# doesn't support something like Feature X, it must not support unions, right?" Wrong...sort of. Enter: the FieldOffset attribute.
struct reg24_t { union { uint32 d; struct { uint16 order_lsb2(w, wh); }; struct { uint8 order_lsb4(l, h, b, bh); }; }; ...
I thought to myself, "C# doesn't support something like Feature X, it must not support unions, right?" Wrong...sort of. Enter: the FieldOffset attribute.
[StructLayout(LayoutKind.Explicit)] public class Reg24 { [FieldOffset(0)] public uint d; [FieldOffset(0)] public ushort w; [FieldOffset(2)] public ushort wh; [FieldOffset(0)] public byte l; [FieldOffset(1)] public byte h; [FieldOffset(2)] public byte b; [FieldOffset(3)] public byte bh; ...
In this piece of code (which assumes little endian architecture), the StructLayout attribute with an argument of LayoutKind.Explicit tells the compiler, "I'm going to explicitly lay out the memory for the fields in this object." LayoutKind.Sequential is the default.
The FieldOffset attribute is applied to fields you want to be part of your "union," where the offset is the number of bytes from the start of memory allocated to the object. By offsetting the byte variable l to the zeroth byte in the Reg24 class, writing to that field will write to the upper byte in both w and d. Congratulations, you've created a union and earned the curses of anyone who has to maintain your code.
The FieldOffset attribute is applied to fields you want to be part of your "union," where the offset is the number of bytes from the start of memory allocated to the object. By offsetting the byte variable l to the zeroth byte in the Reg24 class, writing to that field will write to the upper byte in both w and d. Congratulations, you've created a union and earned the curses of anyone who has to maintain your code.
Array Pointers
In C#, arrays are objects which encapsulate managed memory. Consequently, there's no way to iterate through the elements of C# arrays using pointer arithmetic. This also means you can't pass a pointer to an arbitrary location in array memory. This presents a problem when translating logic such as this:
Here, the variable data points to the beginning of the array ppu.output. If a conditional is met, the memory address stored by the pointer is incremented by 512 bytes. Later on in the method, the memory address is incremented using additional arithmetic to create a temporary address, which is dereferenced to write the pixelcolor value.
The .NET Framework contains a generic struct, ArraySegment, which can be used as a substitute for the functionality of an array pointer.
void Video::draw_cursor(uint16_t color, int x, int y) { uint16_t *data = (uint16_t*)ppu.output; if(ppu.interlace() && ppu.field()) { data += 512; } ... if(hires == false) { *((uint16_t*)data + vy * 1024 + vx) = pixelcolor; } else { *((uint16_t*)data + vy * 1024 + vx * 2 + 0) = pixelcolor; *((uint16_t*)data + vy * 1024 + vx * 2 + 1) = pixelcolor; } ...
Here, the variable data points to the beginning of the array ppu.output. If a conditional is met, the memory address stored by the pointer is incremented by 512 bytes. Later on in the method, the memory address is incremented using additional arithmetic to create a temporary address, which is dereferenced to write the pixelcolor value.
The .NET Framework contains a generic struct, ArraySegment, which can be used as a substitute for the functionality of an array pointer.
private void draw_cursor(ushort color, int x, int y) { var data = PPU.ppu.output; if (PPU.ppu.interlace() && PPU.ppu.PPUCounter.field()) { data = new ArraySegment<ushort>(data.Array, data.Offset + 512, data.Count - 512); } ... if (hires == false) { data.Array[data.Offset + (vy * 1024 + vx)] = pixelcolor; } else { data.Array[data.Offset + (vy * 1024 + vx * 2 + 0)] = pixelcolor; data.Array[data.Offset + (vy * 1024 + vx * 2 + 1)] = pixelcolor; } ...
ArraySegments are created using an array, the offset into the arrray, and the number of elements in the array past the offset point.
Use of ArraySegment is one of the areas of the code I'm disappointed with. I would revisit it if I were to continue work on the project. The struct is painfully simple; there isn't even an indexer to access array elements directly. Additionally, ArraySegment is immutable: if you wish to change the offset, you have to create a new struct. These shortcomings could be easily solved by writing my own ArraySegment-style struct.
Use of ArraySegment is one of the areas of the code I'm disappointed with. I would revisit it if I were to continue work on the project. The struct is painfully simple; there isn't even an indexer to access array elements directly. Additionally, ArraySegment is immutable: if you wish to change the offset, you have to create a new struct. These shortcomings could be easily solved by writing my own ArraySegment-style struct.
Reference Type Primitives
In C#, primitives are actually represented by structs in the System namespace, making them value types. You can pass a value type by reference in C#, using the ref keyword, but there's no language feature which lets you explicitly declare a variable to be of reference type. This became a problem when I came across the following struct in bsnes code:
struct regs_t { uint16_t pc; uint8_t r[4], &a, &x, &y, &sp; regya_t ya; flag_t p; regs_t() : a(r[0]), x(r[1]), y(r[2]), sp(r[3]), ya(r[2], r[0]) {} };
The variables a, x, y, and sp are references to the individual elements of the array r. There were several ways I could have approached solving this. The first was to write a class wrapper for byte. Classes are passed by reference in C#, so any wrapper around a byte value would pull the member byte with it when passed around. But with this approach, the array r would also have to be an array of byte wrappers. Arrays are instances of a class in C#, so they're passed by reference. Thus, an array of byte wrappers would be redundant.
Instead, since this was an isolated occurrence of primitive type references, I opted to solve the problem by using an ArraySegment struct as an offset to each of the elements of r.
public class Regs { public ushort pc; public byte[] r = new byte[4]; public ArraySegment<byte> a, x, y, sp; public RegYA ya; public Flag p = new Flag(); public Regs() { a = new ArraySegment<byte>(r, 0, 1); x = new ArraySegment<byte>(r, 1, 1); y = new ArraySegment<byte>(r, 2, 1); sp = new ArraySegment<byte>(r, 3, 1); ya = new RegYA(new ArraySegment<byte>(r, 2, 1), new ArraySegment<byte>(r, 0, 1)); } }
The ArraySegments themselves are passed by value, but the array they wrap is passed by reference. Therefore, the value each of the variables index will remain the same wherever one of the structs is referenced.
Template Metaprogramming
C++ and C# templates are very different beasts. I still have much learning to do to comprehend the differences between the two. In C#, templates are called generics, and are a simplified version of C++ templates. Most importantly for my port, C# generics do not allow non-type parameters. In C++, a compile time constant can be specified as a template argument, allowing for methods in bsnes like this:
where uclip can have the desired number of bits generated at compile time by being called as
Since C# does not allow generics to be used like this, functions such as these could be changed to:
However, this does not generate the same type of function call as C++, where the number 2 has been compiled into the function itself.
For some template methods, I had to get a little more creative, such as initializing the opcode tables for the processor cores. In bsnes, non-type parameters were used to initialize the table's function calls at compile time:
Since all arguments were generated at compile time using template metaprogramming, the signature of the function pointers is void function(void), allowing the table to be initialized uniformly.
In C#, I pass my method arguments manually. Since the delegate signatures of the opcode table entries have to match each other, it is necessary for each method to match a common signature. I created an argument class, where I could initialize the parameters needed for a particular method:
The beauty of template metaprogramming is definitely lost in the translation. However, this lack of functionality is most frustrating when porting the variant data types, such as 2-bit, 3-bit, and 17-bit unsigned integers, found in SNES hardware. bsnes contains an elegant, although somewhat cryptic, template class to generate these types:
and a new variant data type can be defined with the line
As a TDD developer and all-around fan of simple development, I have quite a pet peeve for duplicated code. It increases the number of things I have to keep in my brain at any given time, and when I make a change to duplicated code, I have to change it everywhere the code has been duplicated. It therefore came as a huge disappointment when I had to copy/paste the entire class of every variant data type I wanted to generate:
Gross. If someone knows a way around this, please come forward, as the solution eludes me. After this experience, non-type parameter generic methods are a feature I anxiously await in a future version of C#.
Overloaded Operatorstemplate<int bits> inline unsigned uclip(const unsigned x) { enum { m = (1U << bits) - 1 }; return (x & m); }
where uclip can have the desired number of bits generated at compile time by being called as
uclip<2>(data + 1);
Since C# does not allow generics to be used like this, functions such as these could be changed to:
uclip(2, data + 1);
However, this does not generate the same type of function call as C++, where the number 2 has been compiled into the function itself.
For some template methods, I had to get a little more creative, such as initializing the opcode tables for the processor cores. In bsnes, non-type parameters were used to initialize the table's function calls at compile time:
op[0x34] = &SMPcore::op_read_a_dpx<&SMPcore::op_and>; op[0x35] = &SMPcore::op_read_a_addrr<&SMPcore::op_and, X>; op[0x36] = &SMPcore::op_read_a_addrr<&SMPcore::op_and, Y>; op[0x37] = &SMPcore::op_read_a_idpy<&SMPcore::op_and>; op[0x38] = &SMPcore::op_read_dp_const<&SMPcore::op_and>; op[0x39] = &SMPcore::op_read_ix_iy<&SMPcore::op_and>;
Since all arguments were generated at compile time using template metaprogramming, the signature of the function pointers is void function(void), allowing the table to be initialized uniformly.
In C#, I pass my method arguments manually. Since the delegate signatures of the opcode table entries have to match each other, it is necessary for each method to match a common signature. I created an argument class, where I could initialize the parameters needed for a particular method:
opcode_table[0x34] = new SMPCoreOperation(op_read_a_dpx, new SMPCoreOpArgument() { op_func = op_and }); opcode_table[0x35] = new SMPCoreOperation(op_read_a_addrr, new SMPCoreOpArgument() { op_func = op_and, i = (int)OpCode.X }); opcode_table[0x36] = new SMPCoreOperation(op_read_a_addrr, new SMPCoreOpArgument() { op_func = op_and, i = (int)OpCode.Y }); opcode_table[0x37] = new SMPCoreOperation(op_read_a_idpy, new SMPCoreOpArgument() { op_func = op_and }); opcode_table[0x38] = new SMPCoreOperation(op_read_dp_const, new SMPCoreOpArgument() { op_func = op_and }); opcode_table[0x39] = new SMPCoreOperation(op_read_ix_iy, new SMPCoreOpArgument() { op_func = op_and });
The beauty of template metaprogramming is definitely lost in the translation. However, this lack of functionality is most frustrating when porting the variant data types, such as 2-bit, 3-bit, and 17-bit unsigned integers, found in SNES hardware. bsnes contains an elegant, although somewhat cryptic, template class to generate these types:
template<unsigned bits> class uint_t { private: enum { bytes = (bits + 7) >> 3 }; typedef typename static_if< sizeof(int) >= bytes, unsigned int, typename static_if< sizeof(long) >= bytes, unsigned long, typename static_if< sizeof(long long) >= bytes, unsigned long long, void >::type >::type >::type T; static_assert(!std::is_same<T, void>::value, ""); T data; ...
and a new variant data type can be defined with the line
typedef uint_t<2> uint2;
As a TDD developer and all-around fan of simple development, I have quite a pet peeve for duplicated code. It increases the number of things I have to keep in my brain at any given time, and when I make a change to duplicated code, I have to change it everywhere the code has been duplicated. It therefore came as a huge disappointment when I had to copy/paste the entire class of every variant data type I wanted to generate:
public struct uint2 { private uint data; private const int bits = 2; ... public struct uint9 { private uint data; private const int bits = 9; ...
Gross. If someone knows a way around this, please come forward, as the solution eludes me. After this experience, non-type parameter generic methods are a feature I anxiously await in a future version of C#.
There was some confusion on my part when I first started using overloaded operators in C#. I had never worked on a project that required them, so I expected them to behave like overloaded operators in C++.
... inline operator T() const { return data; } inline T operator ++(int) { T r = data; data = uclip<bits>(data + 1); return r; } inline T operator --(int) { T r = data; data = uclip<bits>(data - 1); return r; } inline T operator ++() { return data = uclip<bits>(data + 1); } inline T operator --() { return data = uclip<bits>(data - 1); } inline T operator =(const T i) { return data = uclip<bits>(i); } inline T operator |=(const T i) { return data = uclip<bits>(data | i); } inline T operator ^=(const T i) { return data = uclip<bits>(data ^ i); } inline T operator &=(const T i) { return data = uclip<bits>(data & i); } inline T operator<<=(const T i) { return data = uclip<bits>(data << i); } inline T operator>>=(const T i) { return data = uclip<bits>(data >> i); } inline T operator +=(const T i) { return data = uclip<bits>(data + i); } inline T operator -=(const T i) { return data = uclip<bits>(data - i); } inline T operator *=(const T i) { return data = uclip<bits>(data * i); } inline T operator /=(const T i) { return data = uclip<bits>(data / i); } inline T operator %=(const T i) { return data = uclip<bits>(data % i); }
The first operator overloaded in this example, the cast operator, converts the variant struct to a primitive data type. This can also be accomplished in C#, using the static explicit operator overload:
public static explicit operator uint(uint2 number) { return number.data; }
Continuing through the list of overloaded operators, the assignment operator cannot be overloaded in C#. Instead, I created an Assign method.
public uint Assign(uint i) { return data = Bit.uclip(bits, i); }
At first I was concerned I wouldn't remember to use this function when converting an expression where a uint is assigned to the uint2 class. However, because of type safety in C#, this situation was avoided by a compile error.
Finally, the overloaded arithmetic and bit-wise operators. These operators caused me the most confusion, since all the arithmetic operators I overload are actually assignment operators. In C++, this is addressed by assigning the result of the operation to the internal unsigned integer, data.
In C#, you cannot overload the operator/assignment operations. For example, you can overload +, but not +=. I learned that this is because C# handles the += operator for you automatically, once the + operator has been overloaded. Rather than assign the result of an operation back to the object itself, you return the value only. C# determines whether the assignment should take place based on the context:
public static uint2 operator +(uint2 number, uint i) { return new uint2(Bit.uclip(bits, number.data + i)); }
The same is true of the pre and post-increment and decrement operators. Rather than handle both cases, you overload the + and - operators and increment or decrement by 1. C# handles the order issues automatically.
public static uint2 operator ++(uint2 number) { return number + 1; }
Multiple Inheritance
One of the first things a C++ developer discovers when learning C# is the lack of multiple class inheritance, to prevent the ambiguity that can arise from abuse of this language feature. This immediately posed a problem for SnesBox, since bsnes uses extensive multiple inheritance, such as in the CPU class:
class CPU : public Processor, public CPUcore, public PPUcounter, public MMIO { ...
There is no way to get around the lack of inheritance from multiple classes in C#, and no way to fully simulate the behavior of using more than one base class, such as overriding methods and accessing protected members. Any technique which presumes to do so is, in the end, a trick or a hack.
C# does, however, allow the implementation of multiple interfaces. Many developers have used this to their advantage in as many different ways when attempting to program multiple inheritance "into" the C# language. Since I was writing all the SnesBox code myself and could define how the code interacted with itself, using multiple interfaces was an acceptable solution. The SnesBox implementation of CPU looks like this:
partial class CPU : CPUCore, IPPUCounter, IProcessor, IMMIO { ...
In the code for SnesBox, Processor still exists as an explicit class:
class Processor { public Thread thread; public uint frequency; public long clock; ... public Processor() { thread = null; } }
However, instead of making any class inherit directly from Processor, I use an IProcessor interface.
interface IProcessor { Processor Processor { get; } }
The Processor class is never used as an expected base class anywhere in the code. Instead, any method which would have expected an instance of a Processor instead takes an instance of an IProcessor. Processor functionality is then accessed through the Processor property in the interface.
For example, in the bsnes C++ code, the method "step" uses the clock and frequency fields it has inherited from the Processor base class.
void CPU::step(unsigned clocks) { smp.clock -= clocks * (uint64)smp.frequency; ppu.clock -= clocks; for(unsigned i = 0; i < coprocessors.size(); i++) { Processor &chip = *coprocessors[i]; chip.clock -= clocks * (uint64)chip.frequency; } }
Inside the for loop, each element in the array of coprocessors is accessed as a Processor. In the SnesBox C# code, the function looks mostly the same:
public void step(uint clocks) { SMP.smp.Processor.clock -= (long)(clocks * (ulong)SMP.smp.Processor.frequency); PPU.ppu.Processor.clock -= clocks; for (uint i = 0; i < coprocessors.Count; i++) { IProcessor chip = coprocessors[(int)i]; chip.Processor.clock -= (long)(clocks * (ulong)chip.Processor.frequency); } }
Each piece of Processor functionality is accessed using the Processor property in IProcessor. A method can still access an object as an IProcessor, rather than its derived type. The step function does this inside the for loop.
There is not much "syntactic sugar" to be found in an implementation of multiple inheritance such as this. Each time the interface-implemented functionality is used, the property which contains the actual object must come before any fields or methods. CPUCore is the largest and most complicated of all the base classes used by CPU, and it is from this class that CPU inherits a significant amount of its functionality. Because of the relative awkwardness of using interfaces as a source of multiple inheritance, I chose CPUCore as my one explicit base class.
Coroutines and Fibers
Much can be said about coroutines, since it was, ultimately, the issue that defeated this project. In bsnes, each emulated processor is run on a fiber. Fibers, and thus, the processors in bsnes, operate using cooperative multitasking. This means that the fibers themselves control when they stop work, and when another fiber starts up after it. This is where fibers differ from threads, which are scheduled by the OS. For more on the subject from a gaming perspective, I suggest Ben Carter's post on #AltDevBlogADay.
As I mentioned in my last post, I got around this issue for debugging purposes by using deprecated methods in .NET threads which allow them to cooperatively multitask. Since the feature is deprecated and the threads were heavyweight, the performance was abysmal. This lead me to explore different means of preserving a stack at an arbitrary point during execution and resuming execution in a place where it last left off.
A coroutine is a function that has multiple entry and exit points. The stack is effectively preserved each time a coroutine returns, allowing the coroutine to resume execution at the point it returned from when it is called again. Coroutines are possible in C#, using the yield statement. For a fantastic use of coroutines in C#, see Rob Eisenberg's presentation from MIX 2010, Build Your Own MVVM Framework (don't worry, the title makes it sound more intimidating than it actually is). If you're interested in some of the finer details of what's happening under the hood of a C# coroutine, check out Jeremy Likness's blog on sequential asynchronous workflows.
When the yield keyword is used in a method, the C# compiler generates a class behind the scenes to implement an iterator block for that method. This class is a state machine which allows iteration through the states in the method. Each yield statement signifies that a new state in the iterator has been reached, and execution returns to the method doing the iterating. The state inside the iterator is preserved until the next iteration.
For instance, in the threaded version of SnesBox, the CPU may yield its execution to the SMP at any time by calling:
public void synchronize_smp() { if (SMP.smp.Processor.clock < 0) { Libco.Switch(SMP.smp.Processor.thread); } }
In the Switch function, the current thread (the CPU) is paused, and the SMP thread is resumed. In this way, processors hand off execution to one another thousands of time each frame.
Using coroutines, I can preserve state at the same point using a yield statement:
public IEnumerable synchronize_smp() { if (SMP.smp.Processor.clock < 0) { yield return SMP.smp.Processor.thread; } }
Here, the enumerator from the SMP iterator block is returned as the result of the iteration. For nested coroutines to work properly, a method must become an enumerable method if it nests another enumerable method. For instance, a method which calls synchronize_smp must also yield return any enumerable results of synchronize_smp, and so on:
private IEnumerable scanline() { foreach (var e in synchronize_smp()) { yield return e; }; foreach (var e in synchronize_ppu()) { yield return e; }; foreach (var e in synchronize_coprocessor()) { yield return e; }; foreach (var e in System.system.scanline()) { yield return e; }; ...
Much like const-correctness in C++, nested coroutines in C# are "catching," and spread up the callstack. At the top of the enumerable processors, the scheduler iterates through all returned enumerators until an exit code is received, indicating it is time to draw a frame.
I created a test project that had a class which used nested coroutines, four layers deep, and a corresponding class that did the same thing, but used .NET threads to preserve the stack. I performed 10,000,000 switches back and forth and got the following timings:
Threading Start
Time: 00:01:52.0351499
Yielding Start
Time: 00:00:07.8278730
Coroutines produced a 14x speed increase. This made me very optimistic about doing the same thing to SnesBox, so I began work in a separate branch.
I was quite disappointed when the version of SnesBox using coroutines ran more slowly than the version with threads. I didn't investigate the cause of the performance loss thoroughly. Since the processor switching can occur at nearly any place during execution of bsnes, it was necessary to place an iteration loop around almost every function in the codebase. I suspect that doing an iteration at every level in the callstack produces so much overhead that the benefits of moving from heavyweight threads are lost.
Conclusions
As I asserted in my previous post, even though the project was a failure, I was glad I gave it a try. It was exciting to try to make a contribution to the emulation community, and to learn a little more about what's going on inside the machine when I start up a cartridge. Mostly, it was a fun experiment in porting code from C++ to C#.
I'll be keeping an eye open for any significant changes in bsnes related to fibers. I also hold misplaced hope that a future release of the .NET Framework may contain support for lightweight threads.