In an alternate Universe, .NET code executes in parallel

Actually, it’s right here as the Microsoft Parallel Extensions to the .NET 3.5 Framework Dec. 2007 CTP (what a mouthful!). If you’re not familiar with these extensions, the brief summary is simple (straight from MSDN):

Parallel Extensions to the .NET Framework is a managed programming model for data parallelism, task parallelism, and coordination on parallel hardware unified by a common work scheduler. Parallel Extensions makes it easier for developers to write programs that scale to take advantage of parallel hardware—providing improved performance as the numbers of cores and processors increase—without having to deal with many of the complexities of today’s concurrent programming models. Parallel Extensions provides library-based support for introducing concurrency into applications written with any .NET language, including but not limited to C# and Visual Basic.

What that’s trying to say is that it makes it easier to write complex multi-threaded, divide and conquer-style code. If you have code that just uses threads occasionally, maybe via the ThreadPool (or my current favorite, the BackgroundWorker), you could switch to using these extensions. But, the benefits currently won’t likely outweigh the work necessary to make the switch. If however, you’ve got code that could take advantage of parallelism, code that is executing on multiple processors simultaneously (doing the exact same functions), then the extensions may be exactly what you’re looking for.

I’ve absolutely written code before which has a queue, multiple threads that are processing the queue, and chugged through that code. It’s messy, error prone, and subject to performance woes if not written carefully. I hate writing it as it’s always easier to code incorrectly (or not adequately) than it is to write correctly.

At it’s most simple, it could be used like this:

static void Main(string[] args)
{
    int[,] data = new int[1024, 512];
    Parallel.For(0, 1024, i =>
    {
        for (int j = 0; j < 512; j++)
        {
            data[i, j] = (int) j * i ;
        }

        Console.Write("[{0}]", i);
    });

    Console.WriteLine("Done!");
    Console.ReadKey();
}

A simple loop which, in parallel, executes the inner loop to set the rows of data. The Parallel.For statement doesn’t return until the loop is entirely completed. Super simple.

There’s no guarantee of order as you can see from these results:

image 

Clearly, some of the work is being done out of sequence. This is one of the key tenants and rules of the Parallel extensions currently — the work will be done when the work is done — in whatever order is determined to be best by the Parallel Task Scheduler. You should never rely on any particular order. Period. Assume things are in a semi-random sequence.

There is also a handy IEnumerable ForEach iterator:

static void ComputeHashes(string startFolder)
{
    DirectoryInfo rootFolder = new DirectoryInfo(startFolder);
    FileInfo[] files = rootFolder.GetFiles("*.*");

    Parallel.ForEach<DirectoryInfo>(rootFolder.GetDirectories(), dir =>
    {
        ComputeHashes(dir.FullName);
    });

    Parallel.ForEach<FileInfo>(files, file =>
    {
        Console.WriteLine("{0} .... ", file.Name);
        MD5 md5 = MD5.Create();
        byte[] hash;
        using (FileStream stream = File.OpenRead(file.FullName))
        {
            hash = md5.ComputeHash(stream);
        }

        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < hash.Length; i++)
        {
            sb.AppendFormat("{0:X}", hash[i]);
        }

        Console.WriteLine("{0} - {1}", file.Name, sb.ToString());
    });
}

The above example iterates through folders in search of files so that the MD5 HASH checksum can be output.

I’m not going to attempt to go through all the APIs and options (as there’s a lot there including Parallel Extensions for LINQ). There’s ample documentation available on the web, and a well maintained blog by the team at Microsoft. This blog post in particular answers a bunch of questions that you might have about the Parallel Extensions.

One thing I’d like to see is for an easier way to debug an application when using the Parallel Extensions. When multiple threads are active, it’s really tough to debug (even on a single workstation) a multi-threaded application. In my threaded development, I usually have a switch I can set to use only a single thread for testing, debugging, so that I can retain some part of my sanity. I’d suggest something like a Parallel.DebugMode = true … I don’t need to set the specific number of threads … just give me one. :)

I’m not clear on when I’d want to use these extensions, especially on a web server — seems like the answer is never (or rarely)?

In any case, now’s the time to check this stuff out so that you can provide feedback before they ship.