Wednesday, May 11, 2011

Enhancing Reflection with IL Instructions



Download sample ILInstruction class
Download sample MethodBaseEx class

As I embarked in a quest to bring multi-inheritance to versions of the .NET framework pre-.NET 4 (more on this later) I noticed that reflection is not really user friendly when it comes down to getting IL instructions. Before I can get my groove on and talk about multi-inheritance, I figured it would be worthwhile to extend the reflection classes a little to make it easier to work with IL.



The ILInstruction Class


The first thing we'll need is a simple class that will hold some useful information about IL instructions. Let's call it the "ILInstruction" class:


 public sealed class ILInstruction
    {
        public int Offset { get; set; }
        public OpCode OpCode { get; set; }
        public object Arguments { get; set; }

        public bool IsMethodCall { get { return this.Arguments is MethodInfo; } }
        public bool IsConstructorCall { get { return this.Arguments is MethodInfo; } }

        public override string ToString()
        {
            return string.Format("{0} : {1}", this.Offset.ToString("X4"), this.OpCode);
        }
    }

fairly straight forward. The most important properties here are the OpCode and the Arguments which is all you really need. The rest are nice to haves.




IL byte array -> IL Instructions


Now we should move on to populating a bunch of these with IL data. Unfortunately the only means that you have to get IL is by using MethodBase's GetMethodBody method. This method returns a class with a couple of useful properties such as LocalVariables and MaxStack. But more importantly, this method allows you to call GetILAsByteArray which returns the IL's byte array. This is not very useful as is, but it's what we need to populate instances of our newly defined ILInstruction class.


A couple of notes about IL byte arrays: OpCodes are 2 bytes long (I never understood this decition since there aren't more than 255 OpCodes and chances are there never will be) but for optimization purposes, OpCodes in the 0-255 range get reduced to 1 byte when converted to binary data. Moreover, for some reason even though there are only 226 OpCodes some of them have a value greater than 255. Therefore you cannot rely on a fixed number of bytes that represent an OpCode. The other important thing to note is that not all OpCodes take arguments, but those that do usually are represented as a 1-4 bytes of data and its meaning differ from OpCode to OpCode.


All of that said, now that we have the IL and we understand what it means, we can start looping through it byte by byte and interpreting the meaning of each byte like so:


 byte[] bytes = methodBody.GetILAsByteArray();
  
                int offset = 0;
                while (offset < bytes.Length)
                {
                    ILInstruction instruction = new ILInstruction();
                    instruction.Offset = offset;
                    instruction.OpCode = _opCodes[(short)bytes[offset] == 0xfe ? (short)(bytes[offset + 1] | 0xfe00) : (short)bytes[offset]]; // note that some opcodes have a value greater than 255, so in those cases we take the following byte as well
                    
                    if ((short)bytes[offset] == 0xfe)
                        offset += 2;
                    else
                        offset++;

                    switch (instruction.OpCode.OperandType)
                    {
      }
  }

The code bellow illustrates how to figure out what the OpCode for an instruction is, but after reading an OpCode we must also figure out wether arguments data follows and if so we must interpret the data like this:

 switch (instruction.OpCode.OperandType)
                    {
                        case OperandType.InlineBrTarget:
                            offset += 4;
                            break;

                        case OperandType.InlineField:
                            instruction.Arguments = methodBase.Module.ResolveField(bytes.GetInt32(offset));
                            offset += 4;
                            break;

                        case OperandType.InlineI:
                            offset += 4;
                            break;

                        case OperandType.InlineI8:
                            offset += 8;
                            break;

                        case OperandType.InlineMethod:
                            int metaDataToken = bytes.GetInt32(offset);

                            Type[] genericMethodArguments = null;
                            if (methodBase.IsGenericMethod == true)
                                genericMethodArguments = methodBase.GetGenericArguments();

                            instruction.Arguments = methodBase.Module.ResolveMethod(metaDataToken, methodBase.DeclaringType.GetGenericArguments(), genericMethodArguments);
                            offset += 4;
                            break;

                        case OperandType.InlineNone:
                            break;

                        case OperandType.InlineR:
                            offset += 8;
                            break;

                        case OperandType.InlineSig:
                            offset += 4;
                            break;

                        case OperandType.InlineString:
                            instruction.Arguments = methodBase.Module.ResolveString(bytes.GetInt32(offset));
                            offset += 4;
                            break;

                        case OperandType.InlineSwitch:
                            int count = bytes.GetInt32(offset) + 1;
                            offset += 4 * count;
                            break;

                        case OperandType.InlineTok:
                            offset += 4;
                            break;

                        case OperandType.InlineType:
                            offset += 4;
                            break;

                        case OperandType.InlineVar:
                            offset += 2;
                            break;

                        case OperandType.ShortInlineBrTarget:
                            instruction.Arguments = typeof(Label);
                            offset += 1;
                            break;

                        case OperandType.ShortInlineI:
                            offset += 1;
                            break;

                        case OperandType.ShortInlineR:
                            offset += 4;
                            break;

                        case OperandType.ShortInlineVar:
                            offset += 1;
                            break;

                        default:
                            throw new NotImplementedException();
                    }

The only important thing to take away from this piece of code is that methods, fields and constats are stored in the module's metadata, and the IL byte array provides a unique identifier for the metadata table. So what we must do is call the corresponding ResolveX to get a meaningful object (Depending on what OpCode we are dealing with the argument will have a different meaning and should be looked up on a different metadata table).

So there you have it folks. This is how you can turn IL byte arrays to meaningful Object-Oriented code. Unfortunately this implementation does not account for every single scenarion that you can encounter and it does not implement all operand types. But stay tooned for a complete CodePlex project with all the bells and whistles needed to create your own Reflector like software.