Asynchronous pluggable protocols for .NET

Displaying arbitrary HTML content in the application is great for many kinds of tasks like reports, help pages, integrated help tips, about boxes, embedded Flash vector animation, and lots more.

By the way, if you're not interested in the background, you may want to proceed to “using the library” section.

When the content that needs to be displayed is static, everything is simple. You create the web browser control, then navigate it to “file://localhost/” URL do display the content stored on the local file system. Or navigate it to the “res://” URL to display the content embedded in the executable resources. And you can of course navigate it the real Internet URLs, if you need that in your application.

Dynamic Content Problem

The things get worse when you need to display the dynamic content. When your content is small, like the “Hello World” string, you’re OK setting the DocumentText property to something like “<html><body>Hello World</body></html>”.

When the content is big and complex however, the following problems arise:

  • If you want your user to be able to copy the text out of the web browser, you’ll want to enable the browser’s hot keys, and context menu. They both have refresh functionality. When you’re displaying the content by setting the DocumentText property, the document URL is still “about:blank”, so any refresh will evaporate your whole TPS report, leaving you with the big white nothing.
  • If you want your user to be able to navigate back & forth between your custom pages, you’ll discover the navigation functionality of the web browser is severely broken when the documents have no valid document URL.
  • “DocumentText” property is a string. Performing many operations on long strings is ineffective from the memory manager’s perspective. Besides, the strings are 2 bytes per character.

Workarounds

I see several obvious ways to work around the problem. But I don’t like either of them.

Stick to DocumentText or DocumentStream property, hack around the problems

Technically it’s possible to fix the web browser control by implementing a pair of COM interfaces & such. For an example, see the article “Extended .NET 2.0 WebBrowser Control” on codeproject.com.

  • Everything may break some day when the IE gets updated.
  • Inevitable mess with windows sub-classing and windows messages filtering. Remember you’ll have to test against any software that deals with foreign applications windows procedures (yandex punto switcher, abbyy lingvo, etc).
  • This workaround feels like the duct-tape: nasty and unreliable.

Save generated content to the hard drive

Somewhere to the %TEMP% folder, for example. I saw this approach a lot, in various software products here and there.

  • Unnecessary HDD load. In the modern era of fast CPUs, HDD seeks is among the main reasons why desktop software is perceived as “sluggish”. For 7400 RPM hard drives, the 13-14 milliseconds average random seek time is consistent over both 2TB brand new ones, and 100GB 5-years-old ones.
  • Potential concurrent access issues at the file system level. Possible “Sharing violation” errors, possibly incomplete documents when user happens to press the F5 key while the software is updating the file, etc.
  • Security. My database contains lots of sensitive data and is hence totally encrypted with strong Rijndael cipher. It’s extremely undesirable to have my reports stored plain-text in the %TEMP% folder, waiting for someone to read them. If you just thought about erasing the data when it’s no longer required, think about the state of the hard drive after BSoD / power outage.

Implement a local web server.

And navigate the web browser to the “http://localhost:31337/Something”. I saw this approach in the installer of some enterprise software that used apache for that purpose; don’t remember the exact name, though.

  • It’s not that easy. For example, you either need to implement web server in your process, or marshal your data across processes.
  • Most likely you’ll encounter various issues when running on a PC with firewall software.

Use IIS on another machine.

This approach might work in some cases for enterprise intranet-based software, while the software I developed does not even require Internet connection.

The Recommended, Microsoft Way

The problem statement “use embedded web browser to display custom content” sounds very common.

Like with the most general software engineering problems, the Microsoft already has the solution, and they incorporate it into their operating systems. Their solution is called “Asynchronous Pluggable Protocols”. They’ve made it over 13 years ago: as you can see browsing the MSDN documentation, the technology is here since Internet Explorer 4.0.

Some applications register their custom protocol system-wide. For my application, however, I didn’t want neither modifying the Windows registry, nor compiling one more executable module, nor exposing my custom protocol system-wide.

Well, it appears my specific requirements are common, too. Microsoft already implemented the functionality to register a temporary pluggable namespace handler, to be used by the current process exclusively: see the RegisterNameSpace method of the IInternetSession interface. This temporary handler requires no information to be added to the registry; neither needs it to be implemented as a separate module.

Downsides

Sometimes, the great flexibility comes for a price of over-engineered API.

In this case, the API is COM-based. No OLE automation is used (for performance reasons, I suppose: remember the API was designed in Pentium MMX era). Basically what you’ve got is some binary COM interfaces to call and/or implement.

I was unable to implement the working solution in pure C# during that several hours I’ve been trying to.

The System.Runtime.InteropServices is good, however in my case I needed to implement 2 native COM interfaces, namely IClassFactory and IInternetProtocol. The IInternetProtocol have no type library at all. I was able to generate it, but then something was wrong (presumably) with the method signatures, so the application crashed.

Then I scratched my head and thought “It’s just 2 damn COM objects to implement. With my background, I can do it in C++/ATL really fast; the problem is interacting with the C# code of the rest of my application”.

I’ve heard about C++/CLI, but since I’m relatively new to the managed world, I never actually used it. I decided to give it a try.

Just as I suspected, I didn’t like the C++/CLI syntax at all. Neither did I like the restrictive compiler not allowing me to mix managed & native things the way I initially tried to. Speaking about the downsides, just open the resulting DLL in Reflector and look at all that ATL, STL, and CRT stuff suddenly being visible to the outside world, for the only reason it were #included and never used.

The solution

Maybe I didn’t like the C++/CLI technology too much; however I was able to achieve what I needed.

Here’s the complete interface of my DLL, as exposed to the C# language.

Using the library.

  1. Download and build the DLL source code.
  2. Implement iProtoPlug interface somewhere. I’ve implemented it in the form class hosting the web browser control. In your getStream implementation:
    • Construct 'new StreamWriter( new MemoryStream(), Encoding.UTF8 )' object.
    • Parse the provided URL. I used prefixes like “fiscalReports”, “businessEvents”, followed by parameters. You might use regular expressions if you want. Generate the corresponding document, and write it to the stream writer.
    • Call StreamWriter.Flush()
    • StreamWriter.BaseStream.Seek( 0, SeekOrigin.Begin )
    • return StreamWriter.BaseStream.Length
  3. Create the instance of the ProtoPlug class. I put it in the web browser owner class, too. By the way, I create only one instance of the web browser control, and reuse it as needed: the startup time is noticeable. Pass “myproto” as the strProtocol constructor argument.
  4. Navigate your web browser to “myproto://Something”.
  5. ???
  6. Profit!

Final words

Not every string is valid URL: RTFM RFC3986 for the reason why.

The managed C++ DLL that uses COM interop may be either 32-bit, or 64-bit, but it can't be "Any CPU".

Spare the memory manager, don’t keep large documents in MemoryStream objects. If you have to, consider reusing MemoryStream object between the requests.

I didn’t test my component too much:

  • The application I’ve developed is essentially single-threaded. I did use threads, however all the UI lived in the main thread of my application.
  • My minimum OS requirement was Vista. That’s why I’ve only tested with framework 3.5, on Windows 7 64 bit and 32 bit, and Windows Vista 32 bit.

If you’re going to use this component commercially, do it free of charge, and plan your QA carefully. The component is small and simple; however it’s my first experience with C++/CLI.

I’ll appreciate if you’ll send me your bug fixes so I can update this article and/or the source code.

Thank you for your time!

Code January 2010, article July 2010.