cd /blog
Dec 13, 2023

Avoiding Boxing

#internals | #lowlevel | #csharp | #boxing

Introduction

I've been working with programming for a long time, especially with Microsoft technologies. Including, but not limited to the C# language.

I had the privilege of following the evolution of the language from the beginning. I saw when things like async await were implemented. And as soon as they became available, I immediately tried to adopt them into projects in production environment, always trying to offer a better experience to our customers.

But, like me, most programmers are practically "users". I explain. Imagine you just bought a new car. It is very unlikely that you will think or worry about the mechanics of this car. You're just going to sit in the seat and drive around. You will only remember the mechanics when there is a problem.

This is how most programmers do their work. They just follow standards, architectures, patterns, guidelines and sometimes even templates. Without worrying too much about the mechanics of things. But sometimes it's important to know a little about how things work under the hood of your car — things may fail.

On November 12, 2014, Microsoft introduced .NET Core—an open-source, cross-platform successor to .NET Framework.

— Wikipedia [1]

On November 12, 2014, Microsoft introduced .NET Core — an open-source, cross-platform successor to .NET Framework. Since then, it has become much easier to understand how things work because now we have access to the source. Additionally, Microsoft has greatly improved their official documentation with examples and videos tutorials. They have even partnered with third parties to create a free course with certification. But for me, one of the most interesting things are the technical blog posts. Usually, they are rich in details and comparisons.

I must to confess that I have invested many hours enjoying posts by authors like Stephen Toub and, even though sometimes I can't read everything because not everything interests me, I have been very satisfied with what I've learned so far without having to investigate as I used do some time ago.

Without further ado, lets get into the subject of this post.

Boxing

So, what is Boxing? Boxing is bad in dotnet because it can affect the performance and memory usage of your application. Boxing is the process of converting a value type (such as int, char, bool, etc.) to a reference type (such as object or any interface type). This means that the value is copied from the stack to the heap, which requires extra memory allocation and garbage collection. Unboxing is the reverse process of converting a reference type back to a value type, which may requires an explicit cast and a check for the correct type.

Here are some reasons to avoid unnecessary boxing:

  1. Performance: Boxing and unboxing operations involve memory allocations and type conversions, which can add overhead. According to this article [2] , boxing and unboxing can be 20 times slower than direct value type operations. Reducing these operations can lead to more efficient code.
  2. Memory Usage: Boxing involves allocating memory on the heap for the boxed object. In scenarios where memory usage is critical, minimizing unnecessary boxing helps in keeping memory consumption under control.
  3. Garbage Collection Overhead: Since boxed objects are allocated on the heap, they contribute to the workload of the garbage collector. Reducing unnecessary boxing can result in fewer objects to be collected, which can be beneficial for garbage collection performance.
  4. Error-prone code: Boxing and unboxing can introduce subtle bugs and errors in your code, such as invalid casts, null reference exceptions, or incorrect equality comparisons. For example, if you compare two boxed value types using the == operator, you are actually comparing their references, not their values, which can lead to unexpected results.
  5. Code Clarity: Avoiding unnecessary boxing can also lead to clearer and more explicit code. When the intent is to work with value types, using generics or other approaches to work directly with the value types can make the code more readable.

Therefore, it's recommended to use generic collections and methods instead of non-generic ones, as they can avoid boxing and unboxing of value types. For example, use List<int> instead of ArrayList, or use int.TryParse instead of Convert.ToInt32. You can also use struct instead of class for your custom value types, as they can implement interfaces without boxing.

Boxing examples

int x = 10; // a value type int y = 10; // another value type object a = x; // boxing x to a reference type object b = y; // boxing y to another reference type Console.WriteLine(x == y); // true, comparing values Console.WriteLine(a == b); // false, comparing references

To avoid this, you should use the Equals method instead of the == operator when comparing boxed value types, as it compares the values regardless of the references. For example:

Console.WriteLine(a.Equals(b)); // true, comparing values

Tip

The native assembly code generated by the Just-In-Time (JIT) compiler during runtime does not explicitly show details such as boxing or other high-level language constructs. The native assembly code represents a lower-level abstraction, and certain high-level language features like boxing, which are present in the IL code, might not be directly reflected in the native assembly.

When you see native assembly code, it has already gone through the compilation and optimization process performed by the JIT compiler. During this process, high-level language constructs are translated into low-level, efficient machine code that directly executes on the target architecture. The details of how the value types are managed, including potential boxing and unboxing operations, are abstracted away in the native assembly code.

While boxing exists at the IL level, the JIT compilation process transforms and optimizes the code for the target platform, and the native assembly code doesn't explicitly represent high-level language constructs like boxing. The low-level details are managed by the runtime and may involve optimizations that are not visible in the native assembly code.

In summary, although the JIT compiler can optimize certain scenarios involving boxing, it's still a good practice to be aware of and minimize unnecessary boxing in your code, especially in performance-critical or memory-constrained scenarios.

Analyzing the problem

I was using this Roslyn Analyzer from Microsoft, although it is archived. This analyzer can help you identify some boxing problems, even though it can sometimes be wrong because it's no longer being actively maintained and the dotnet compiler is evolving way too fast lately.

The other day I made a post on Xwitter reporting a simple but interesting experience I had:

I know this xweet is in Brazilian Portuguese, but don't worry. Just take my hand and I'm gonna take you through the journey.

public string Encrypt(string encryptString) { byte[] stringBytes = Encoding.Unicode.GetBytes(encryptString); StringBuilder sbBytes = new StringBuilder(stringBytes.Length * 2); foreach (byte b in stringBytes) { sbBytes.AppendFormat("{0:X2}", b); // boxing problem } return sbBytes.ToString(); }

The analyzer identified that this variable b in the scope of the foreach loop was causing boxing. And indeed it was, as we will prove later. But the first thing any programmer would do, would be converting this byte into a string using the built-in ToString method.

... sbBytes.AppendFormat("{0:X2}", b.ToString()); // avoiding boxing ...

Soon after I applied this "improvement" locally, I started experiencing errors in different parts of the application. This method is currently used to encrypt passwords, voucher codes, among other things. And it's an important method at the core of this application, delivering a sufficient level of security with great simplicity.

The problem is very naive, but not always evident when we are in a hurry with multiple pending tasks sitting on our heads. According to the official documentation [3] , "If format is null or an empty string (""), the return value is formatted with the general numeric format specifier ("G").", which means the byte will be formatted as a decimal (base 10) number.

For example, the byte array [0x34, 0x00] represents the number 4 in a little-endian format. This means that the least significant byte (LSB) is stored first, followed by the most significant byte (MSB). The LSB is 0x34, which is equal to 52 in decimal. The MSB is 0x00, which is equal to 0 in decimal. To convert the byte array to a number, we need to multiply each byte by 256 raised to the power of its position, starting from 0. So, the formula is:

0x34 * 256^0 + 0x00 * 256^1 = 52 * 1 + 0 * 256 = 52 + 0

And as you can see, this was the root cause of the problem because the decrypt method was expecting the string representation of the bytes in hexadecimal (base 16) instead of decimal (base 10), as b.ToString() will return.

public static string Decrypt(string cipherText) { int numberChars = cipherText.Length; byte[] bytes = new byte[numberChars / 2]; for (int i = 0; i < numberChars; i += 2) { bytes[i / 2] = Convert.ToByte(cipherText.Substring(i, 2), 16); } return Encoding.Unicode.GetString(bytes); }

The solution

The solution was very simple, I just did:

... sbBytes.Append(b.ToString("X2")); ...

This way, we keep the hexadecimal representation of the bytes without causing boxing problems. After this fix, I could continue working on the other tasks assigned to me.

But how to prove it?

The dotnet ecosystem will generally provide you with all the tools you need. Visual Studio has always been an excellent tool.

JetBrains Rider also provides very good tooling for code analysis and troubleshooting.

The secret is to always investigate and measure when necessary because both this blog post and the tools mentioned here can become obsolete very quickly over time. You can also use your own analyzers as well as third-party tools.

I have my own tools because I always like to analyze code with multiple tools in order to achieve more consistent results. However, one of the simplest ways is to use online tools so you don't need to install anything. For example, this IL code below was generated using SharpLab.

... IL_0025: ldloc.1 IL_0026: ldstr "{0:X2}" IL_002b: ldloc.s 4 IL_002d: box [System.Runtime]System.Byte // boxing instruction here IL_0032: callvirt instance class [System.Runtime]System.Text.StringBuilder [System.Runtime]System.Text.StringBuilder::AppendFormat(string, object) IL_0037: pop ...

Thanks for your internet, your time and your patience! 🤓

Disclaimer

It's worth noting that I'm not a Microsoft employee. All opinions in this blog post are my own. The information displayed here is not endorsed by Microsoft, .Net Foundation or any of their partners. This is not a sponsored post. All rights reserved.

  1. .NET Wikipedia, Retrieved December 10, 2023.
  2. Boxing and Unboxing Learn Microsoft, Retrieved December 10, 2023.
  3. Byte.ToString Method Learn Microsoft, Retrieved December 12, 2023.
We use cookies to make interactions with our websites and services easy and meaningful. By using this website you agree to our use of cookies. Learn more.