Library tutorials & articles
Using AppDomains to Build Reliable Systems
Building reliable systems is very hard because it is an all or nothing game. If there exists even one unhandled exception, the entire system is considered unreliable. This is an impossible standard to meet for a large system. In fact, the first step to building reliable systems is to accept that it is impossible. Instead, we will attempt to build a more reliable system from a collection of smaller unreliable components. The idea is to manage failure rather than pursue an impossible perfection.
The motivation for this article is a programming language called Erlang. It was originally designed to build soft real-time systems at Ericsson, a telecommunication company. Erlang is a dynamically-typed functional concurrent programming language. The primary feature of Erlang is oneway message passing between local and/or remote lightweight processes. A lightweight process in Erlang is a very efficient way to construct small, isolated, concurrent programs. Communication is handled by sending discrete messages asynchronously over a channel to another Erlang process. The runtime will move the messages very efficiently if the receiver process is on the same machine; otherwise, it will serialize and send the message over the network to a remote process. By forcing concurrent programs into this restrictive pattern, it reduces concurrency errors and promotes parallelism.
Erlang makes use of these lightweight processes to build reliable programs. These “processes” have similar isolation properties as normal OS processes: protected memory, concurrency, and failure in one process will not bring down the whole system. That last property can be used to build “supervisor” processes, which are responsible for launching and monitoring one or more worker processes. If a worker process fails, the supervisor will restart the worker. Therefore, the entire system will continue to operate even if there are intermittent failures in some components. If the component fails repeatedly, the supervisor can launch a simpler process instead to hopefully keep the system up and running.
For each worker process that one wishes to load into a separate AppDomain, the main supervisor program creates an instance of a Channel which supports reliable messaging between AppDomains. Like Erlang, messages are enqueued concurrently but processed sequentially. The Channel loads the assembly into a new AppDomain, creates an instance of the receiver type, and offers an Enqueue method so other processes can send messages to this process.
// in assembly: supervistor
// do foreach worker process
Channel cq = new Channel("worker", "Reciver1");
channels.add("worker1", cq.Enqueue);
The Receiver1 class inherits from MarshalByRefObject because we use remoting to cross AppDomain boundaries. Remoting between AppDomains on the same CLR is very fast. This class will be in a different assembly. The class Receiver1 has a simple method to get an input message. It also has a method that initializes the worker and gets a mapping of process names to delegates, which send messages to the corresponding process. So if Receiver1 wants to send messages to Receiver7 or 8, it grabs those delegates from the dictionary.
// in assembly: workerl
public class Receiverl : MarshalByRefObject, IReceiver
{
public void OnStart(Dictionary<string, Send> channels)
{
// initialize system and grab the channels it needs
}
public void ReceiveMessage(object msg)
{
// process the message
}
}
The implementation of channels in .NET is an ad-hoc message queue within the same process. Essentially, it is a queue that is safe for concurrent writes, because many processes may write to the same queue concurrently. To make this a little easier to use, the Enqueue method on the queue class will be wrapped in a Send delegate.
public delegate void Send (object msg);
public class Channel
{
string assemblyName, typeName;
AppDomain ad;
IReceiver r;
Thread channelMonitor = new Thread(Dequeue);
public void Enqueue(object msg)
{
// carefully push the msg on the queue
}
void Dequeue()
{
// block until the queue is nonempty
try
{
r.ReceiveMessage(internalDequeue);
}
catch (Exception e)
{
AppDomain.Unload(ad);
LoadAppDomainQ;
}
}
void LoadAppDomain() {
ad = AppDomain.CreateDomain(assemblyName);
r = (IReceiver)newAD.CreateInstanceFromAndUnwrap(assemblyName, typeName) ;
}
public Channel(string a, string t)
{
assemblyName = a;
typeName = t;
}
}
The channel creates another thread to monitor the queue for new messages. The code is elided here, but it basically blocks until new messages arrive on the channel. Then it dequeues a message and calls ReceiveMessage. Finally, here’s the code that makes the system more reliable. If ReceiveMessage fails with an unhandled exception, the Dequeue method catches the exception, unloads the “broken” AppDomain and reloads a fresh copy of it. Unfortunately, this code matches the semantics of Erlang: if the process fails while processing a message, the process is restarted but that particular message is lost.
The purpose of this design is to build a system that roughly emulates the reliability feature in Erlang. The main program (aka supervisor) creates a channel for every worker process, allocating a new AppDomain and connecting to a Receiver object via remoting. Worker processes do not contact each other directly; instead, they are loosely connected by a “reliable” communication channel, similar to a message queue. If an AppDomain should fail catastrophically for any reason, the channel catches the error, unloads the errant code and reloads the AppDomain. Though this design will not magically make your application 100% reliable, it will help your application survive the inevitable errant, uncaught exceptions that plague all systems.
Related articles
Related discussion
-
How to modify desktop app to work from many computers
by systeko (0 replies)
-
how to select item to datagrid from textbox
by chandradev1 (50 replies)
-
Problem after strong naming an assembly
by rinkurathor1 (0 replies)
-
Very slow inserts using SqlCommand.ExecuteNonQuery()
by porchelvi (1 replies)
-
VB.net class to connect to sql database
by senol01 (2 replies)
Related podcasts
-
CodeCast Episode 4: State of .NET, IE8, ASP.NET MVC, and O'Reilly Media
CodeCast Episode 4: State of .NET, IE8, ASP.NET MVC, and O'Reilly MediaHosts Ken Levy and Markus Egger discuss the new State of .NET events, IE8, ASP.NET MVC, followed by an interview from PDC with two editors from O'Reilly Media. More on ASP.NET MVC can be found at http://asp.net/mvc. Interview...
Events coming up
-
Dec
6
Developing AJAX Web Applications with Castle Monorail
London, United Kingdom
Monorail is the model-view-controller engine of the Castle Project, bringing many of the best ideas of Ruby on Rails to the .NET world. In this talk, David De Florinier and Gojko Adzic show how Monorail makes it easy to develop .NET based AJAX applications, and how to use the Castle Project to build Web 2.0 applications effectively. Come to this session if you are a .NET web developer. Everyone is welcome!
Very nice article, I came to the same conclusion in a construction of a reliable system haven't got to the part that I have to write it, but your approach seems simple to make it robust enough. The only part missing I would say with the design and just to make it into a more complete machine, is to give it plug-in capabilities maybe to merge it with an "Inversion Control" system to see what would be capable to accomplish. Get me? give the worker process common Classes at their dispossition. Hope you get this message Pinku, and let me know what you think.
Thanks.
ewfewfewf
This thread is for discussions of Using AppDomains to Build Reliable Systems.