Library tutorials & articles
Strings in .NET and C#
- Introduction
- Interning, Literals and the Debugger
- Memory Usage, Encoding and Internationalization od
Interning, Literals and the Debugger
Interning
.NET has the concept of an "intern pool". It's basically just a set of strings, but it makes sure that every time you reference the same string literal, you get a reference to the same string. This is probably language-dependent, but it's certainly true in C# and VB.NET, and I'd be very surprised to see a language it didn't hold for, as IL makes it very easy to do (probably easier than failing to intern literals). As well as literals being automatically interned, you can intern strings manually with the Intern method, and check whether or not there is already an interned string with the same character sequence in the pool using the IsInterned method. This somewhat unintuitively returns a string rather than a boolean - if an equal string is in the pool, a reference to that string is returned. Otherwise, null is returned. Likewise, the Intern method returns a reference to an interned string - either the string you passed in if was already in the pool, or a newly created interned string, or an equal string which was already in the pool.
Literals
Literals are how you hard-code strings into C# programs. There are two types of string literals in C# - regular string literals and verbatim string literals. Regular string literals are similar to those in many other languages such as Java and C - they start and end with ", and various characters (in particular, " itself, \, and carriage return (CR) and line feed (LF)) need to be "escaped" to be represented in the string. Verbatim string literals allow pretty much anything within them, and end at the first " which isn't doubled. Even carriage returns and line feeds can appear in the literal! To obtain a " within the string itself, you need to write "". Verbatim string literals are distinguished by having an @ before the opening quote. Here are some examples of the two types of literal, and what they amount to:
| Regular literal | Verbatim literal | Resulting string |
|---|---|---|
"Hello"
|
@"Hello"
|
Hello
|
"Backslash: \\"
|
@"Backslash: \"
|
Backslash: \
|
"Quote: \""
|
@"Quote: """
|
Quote: "
|
"CRLF:\r\nPost CRLF"
|
@"CRLF:
Post CRLF"
|
CRLF:
Post CRLF
|
For other escape sequences, please see the relevant FAQ entry. Note that the difference is only for the compiler's sake. Once the string is in the compiled code, there's no such thing as a verbatim string literal vs a regular string literal.
Strings and the debugger
Numerous people run into problems when inspecting strings in the debugger, both with VS.NET 2002 and VS.NET 2003. Ironically, the problems are often generated by the debugger trying to be helpful, and either displaying the string as a regular string literal with backslash-escaped characters in, or displaying it as a verbatim string literal complete with leading @. This leads to many questions asking how the @ can be removed, despite the fact that it's not really there in the first place - it's only how the debugger's showing it. Also, some versions of VS.NET will stop displaying the contents of the string at the first null character, and evaluate its Length property incorrectly, calculating the value itself instead of asking the managed code. Again, it then considers the string to finish at the first null character.
Given the confusion this has caused, I believe it's best to examine strings in a different way when debugging, at least if you think something odd is going on. I suggest using a method like the one below, which will print the contents of a string to the console in a safe way. Depending on what kind of application you're developing, you may want to write this information to a log file, to the debug or trace listeners, or pop it up in a message box.
static readonly string[] LowNames =
{
"NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
"BS", "HT", "LF", "VT", "FF", "CR", "SO", "SI",
"DLE", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
"CAN", "EM", "SUB", "ESC", "FS", "GS", "RS", "US"
};
public static void DisplayString (string text)
{
Console.WriteLine ("String length: {0}", text.Length);
foreach (char c in text)
{
if (c < 32)
{
Console.WriteLine ("<{0}> U+{1:x4}", LowNames[c], (int)c);
}
else if (c > 127)
{
Console.WriteLine ("(Possibly non-printable) U+{0:x4}", (int)c);
}
else
{
Console.WriteLine ("{0} U+{1:x4}", c, (int)c);
}
}
}
Related articles
Related discussion
-
Problem after strong naming an assembly
by rinkurathor1 (0 replies)
-
VB.net class to connect to sql database
by senol01 (2 replies)
-
ASP.NET Patterns every developer should know
by konikula (3 replies)
-
String was not recognized as a valid DateTime.
by buvanasubi (22 replies)
-
help me to get simple requirement
by Slicksim (1 replies)
Related podcasts
-
Looking into the C# Crystal Ball with Charlie Calvert and Bill Wagner
One of the most exciting announcements from PDC was the news about C# 4.0 and Visual Studio 2010. With all the excitement and discussion throughout the event about these new developer tools, we reached out to two experts in the fields. Charlie Calvert and Bill Wagner sat down with Keith and Woody...
Events coming up
-
Dec
6
Developing AJAX Web Applications with Castle Monorail
London, United Kingdom
Monorail is the model-view-controller engine of the Castle Project, bringing many of the best ideas of Ruby on Rails to the .NET world. In this talk, David De Florinier and Gojko Adzic show how Monorail makes it easy to develop .NET based AJAX applications, and how to use the Castle Project to build Web 2.0 applications effectively. Come to this session if you are a .NET web developer. Everyone is welcome!
Hi, i read 2 of your articles they are interesting. However, I can not figure out what or where the contents come from for the variable
readonly string[] LowNames and how it is used. I would appreciate a short description.
Do you mean you don't know how the array is populated, or you don't know why I populated it with the names I did?
To answer the first question - if you look at the code, you'll see there's a static field initializer:
static readonly string[] LowNames =
{
"NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
"BS", "HT", "LF", "VT", "FF", "CR", "SO", "SI",
"DLE", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
"CAN", "EM", "SUB", "ESC", "FS", "GS", "RS", "US"
};
The names come from the man page for ASCII on a unix box
Jon
I'm working on project which converts differnt forms of temperature like Celcius, Farenheit and kelvin etc..
I also have to put up access keys to it for each conversion. Please help me with conversion function and access keys. I also have to put up access keys for reset as well as exit buttons on the form. What is is double type variable?
I'm not at all sure what this has to do with Strings, but the C# type for double precision binary floating point values is "double".
See my article on floating point arithmetic for more information. It's at http://www.pobox.com/~skeet/csharp/floatingpoint.html
(Sorry, the "insert link" button doesn't seem to work in Firefox.)
Jon
Hi, i read 2 of your articles they are interesting. However, I can not figure out what or where the contents come from for the variable
readonly string[] LowNames and how it is used. I would appreciate a short description.
I'm working on project which converts differnt forms of temperature like Celcius, Farenheit and kelvin etc..
I also have to put up access keys to it for each conversion. Please help me with conversion function and access keys. I also have to put up access keys for reset as well as exit buttons on the form. What is is double type variable?
This thread is for discussions of Strings in .NET and C#.