Library tutorials & articles

Strings in .NET and C#

Interning, Literals and the Debugger

Interning

.NET has the concept of an "intern pool". It's basically just a set of strings, but it makes sure that every time you reference the same string literal, you get a reference to the same string. This is probably language-dependent, but it's certainly true in C# and VB.NET, and I'd be very surprised to see a language it didn't hold for, as IL makes it very easy to do (probably easier than failing to intern literals). As well as literals being automatically interned, you can intern strings manually with the Intern method, and check whether or not there is already an interned string with the same character sequence in the pool using the IsInterned method. This somewhat unintuitively returns a string rather than a boolean - if an equal string is in the pool, a reference to that string is returned. Otherwise, null is returned. Likewise, the Intern method returns a reference to an interned string - either the string you passed in if was already in the pool, or a newly created interned string, or an equal string which was already in the pool.

Literals

Literals are how you hard-code strings into C# programs. There are two types of string literals in C# - regular string literals and verbatim string literals. Regular string literals are similar to those in many other languages such as Java and C - they start and end with ", and various characters (in particular, " itself, \, and carriage return (CR) and line feed (LF)) need to be "escaped" to be represented in the string. Verbatim string literals allow pretty much anything within them, and end at the first " which isn't doubled. Even carriage returns and line feeds can appear in the literal! To obtain a " within the string itself, you need to write "". Verbatim string literals are distinguished by having an @ before the opening quote. Here are some examples of the two types of literal, and what they amount to:

Regular literal Verbatim literal Resulting string
"Hello" @"Hello" Hello
"Backslash: \\" @"Backslash: \" Backslash: \
"Quote: \"" @"Quote: """ Quote: "
"CRLF:\r\nPost CRLF" @"CRLF:
Post CRLF"
CRLF:
Post CRLF

For other escape sequences, please see the relevant FAQ entry. Note that the difference is only for the compiler's sake. Once the string is in the compiled code, there's no such thing as a verbatim string literal vs a regular string literal.

Strings and the debugger

Numerous people run into problems when inspecting strings in the debugger, both with VS.NET 2002 and VS.NET 2003. Ironically, the problems are often generated by the debugger trying to be helpful, and either displaying the string as a regular string literal with backslash-escaped characters in, or displaying it as a verbatim string literal complete with leading @. This leads to many questions asking how the @ can be removed, despite the fact that it's not really there in the first place - it's only how the debugger's showing it. Also, some versions of VS.NET will stop displaying the contents of the string at the first null character, and evaluate its Length property incorrectly, calculating the value itself instead of asking the managed code. Again, it then considers the string to finish at the first null character.

Given the confusion this has caused, I believe it's best to examine strings in a different way when debugging, at least if you think something odd is going on. I suggest using a method like the one below, which will print the contents of a string to the console in a safe way. Depending on what kind of application you're developing, you may want to write this information to a log file, to the debug or trace listeners, or pop it up in a message box.

static readonly string[] LowNames =
{
    "NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
    "BS", "HT", "LF", "VT", "FF", "CR", "SO", "SI",
    "DLE", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
    "CAN", "EM", "SUB", "ESC", "FS", "GS", "RS", "US"
};
public static void DisplayString (string text)
{
    Console.WriteLine ("String length: {0}", text.Length);
    foreach (char c in text)
    {
        if (c < 32)
        {
            Console.WriteLine ("<{0}> U+{1:x4}", LowNames[c], (int)c);
        }
        else if (c > 127)
        {
            Console.WriteLine ("(Possibly non-printable) U+{0:x4}", (int)c);
        }
        else
        {
            Console.WriteLine ("{0} U+{1:x4}", c, (int)c);
        }
    }
}

Comments

  1. 05 Jan 2006 at 21:12

    Quote:
    [1]Posted by eliassal on 10 Nov 2005 06:17 AM[/1]
    Hi, i read 2 of your articles they are interesting. However, I can not figure out what or where the contents come from for the variable
    readonly string[] LowNames and how it is used. I would appreciate a short description.


    Do you mean you don't know how the array is populated, or you don't know why I populated it with the names I did?


    To answer the first question - if you look at the code, you'll see there's a static field initializer:


    static readonly string[] LowNames =
    {
       "NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
       "BS", "HT", "LF", "VT", "FF", "CR", "SO", "SI",
       "DLE", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
       "CAN", "EM", "SUB", "ESC", "FS", "GS", "RS", "US"
    };


    The names come from the man page for ASCII on a unix box


    Jon

  2. 05 Jan 2006 at 21:11

    Quote:
    [1]Posted by av_rocksu on 11 Sep 2005 06:00 AM[/1]
    I'm working on project which converts differnt forms of temperature like Celcius, Farenheit and kelvin etc..
    I also have to put up access keys to it for each conversion. Please help me with conversion function and access keys. I also have to put up access keys for reset as well as exit buttons on the form. What is is double type variable?


    I'm not at all sure what this has to do with Strings, but the C# type for double precision binary floating point values is "double".


    See my article on floating point arithmetic for more information. It's at http://www.pobox.com/~skeet/csharp/floatingpoint.html
    (Sorry, the "insert link" button doesn't seem to work in Firefox.)


    Jon

  3. 10 Nov 2005 at 06:17

    Hi, i read 2 of your articles they are interesting. However, I can not figure out what or where the contents come from for the variable
    readonly string[] LowNames and how it is used. I would appreciate a short description.

  4. 11 Sep 2005 at 06:00

    I'm working on project which converts differnt forms of temperature like Celcius, Farenheit and kelvin etc..
    I also have to put up access keys to it for each conversion. Please help me with conversion function and access keys. I also have to put up access keys for reset as well as exit buttons on the form. What is is double type variable?

  5. 01 Jan 1999 at 00:00

    This thread is for discussions of Strings in .NET and C#.

Leave a comment

Sign in or Join us (it's free).

AddThis

Related podcasts

  • Looking into the C# Crystal Ball with Charlie Calvert and Bill Wagner

    One of the most exciting announcements from PDC was the news about C# 4.0 and Visual Studio 2010. With all the excitement and discussion throughout the event about these new developer tools, we reached out to two experts in the fields. Charlie Calvert and Bill Wagner sat down with Keith and Woody...

Events coming up

  • Dec 6

    Developing AJAX Web Applications with Castle Monorail

    London, United Kingdom

    Monorail is the model-view-controller engine of the Castle Project, bringing many of the best ideas of Ruby on Rails to the .NET world. In this talk, David De Florinier and Gojko Adzic show how Monorail makes it easy to develop .NET based AJAX applications, and how to use the Castle Project to build Web 2.0 applications effectively. Come to this session if you are a .NET web developer. Everyone is welcome!