ESENT Serialization Class Library
This document is about my experience designing and using ESENT Serialization Class Library. For current feature set, downloads, and documentation see the corresponding page in "Source code" section.
When I started designing the SMS Control Center software, I immediately faced the question: where should the application store its data?
First I thought about using the XML-serialized collections, which I successfully used in several previous projects. Basically, XML serialization works OK: the data format is human readable (and editable), if you’re concerned about the file size you can gzip them on the fly, you can encrypt them on the fly, and you can implement database schema upgrade with just a single XSL transform.
- Unlike my previous .NET projects, database size of the SMS Control Center is too large to keep it in the RAM.
- I needed full-text search indexing and concurrent access by several threads, and I’m too lazy to implement those features.
Then I thought about SQL Server Express. Since SQL Server Express lives in another process, this means the small amount of extra latency is added each time the DB needs to be queried. More important, I didn’t want to install, support and maintain it. And, I didn’t need the 90% of its features.
I had one more reason against an RDBMS, which is less logical. Shortly before I started working on SMS Control Center, I evaluated the FrontlineSMS software. It’s written in Java and uses an embedded RDBMS, and the product quality is appalling. I spent about a week first trying to compile it then trying to make it work, looking at the traditional UI ugliness associated with windows desktop software written in Java. At that time, I not only was skeptical about the Java platform, but also about using RDBMS for the task.
Then I’ve discovered extensible storage engine (ESENT).
Here’s an excellent overview article by Artour Bakiev, which explains why the ESENT fits perfectly for the task.
Natively, ESENT only has the C API. Fortunately, Laurion Burchall, who works in Microsoft, created the managed wrapper over that API, and published the project on codeplex.
After reading the documentation and running a few quick tests, I believed the technology’s stable, reliable, and efficient.
However, the API is too low level. You need to write pretty much code to use the ESENT database, and you need to write much code every time you need e.g. "select * from TABLE where " RDBMS functionality.
Given the complexity of the product I was developing, it was clear to me that I needed to come up with some higher-level but still generic solution. By "generic" I mean that the same DB access code should be shared across every record type.
That’s why I’ve designed and implemented ESENT serialization library.
SMS Control Center
Up to version 1.0, the development of the ESENT serialization library was driven by SMS control center project.
The decision to use ESENT significantly simplified some aspects of the software architecture, and eventually led to a better product.
Data Backend for UI Controls
The software is Windows desktop application; it uses Windows Forms for the UI. To evaluate the scalability, I’ve created a debug-only method to populate the database with 1 million messages, which is quite realistic count after using my software for a year.
Of course, I don’t expect my user to scroll through the million messages: they are grouped into conversations, there’re several ways to search, and there’s a way to "archive" a conversation. However, I did want to have "All messages" list: I think I was inspired by "All mail" folder in e-mail clients.
For a desktop application, one million is a lot. Most Windows controls allocate some memory for every item, and 1 million of items are too many. That’s why I used virtual-mode data grid view, and virtual list box: they both scale pretty good.
Thanks to remarkably low latency of the ESENT database, the user experience of ESENT-backed controls with 1 million records is "instant", even when tested on a slow hardware: my target platform was 1.6GHz Intel Atom CPU, 1GB RAM, 5400rpm HDD.
The software is multithreaded. Between other things, it uses threads to access potentially faulty pieces of hardware, and to access some web services over the potentially unstable Internet. In such environment, the choice of ESENT proved to be great, due to the following 2 factors:
- Transactions. For example, if the GSM modem hangs in the middle of something, the lower-level code will throw a TimeoutException. The upper level code, containing something like "using(var transaction = session.BeginTransaction())", will dispose the transaction, which in turn will rollback the transaction, and all changes made to all tables within this transaction will be cancelled. Such things are very expensive to achieve unless using a full-blown RDBMS.
- Transaction isolation. Until a modem thread commits the transaction, the UI thread is unaware of the incoming message. There’s no chance user will ever see an incomplete message. My software never needs to pass objects between threads: service threads periodically check for new records in some tables, the UI thread only receives notifications containing just a single integer, record ID. No locking is required, no deadlocks is possible.
Const.me Web Site
I’ve developed version 1.0 of the ESENT serialization library while working for my client. My client generously allowed me to freely use and distribute the library, unless I’ll be using it to develop a clone of the product.
After the successful deployment of SMS control center product, I had an extremely positive attitude towards the ESENT technology. I’ve decided to use it to implement the comments feature on my web site. Anyway, my web site is hosted on Windows Server 2008, and is built using ASP.NET: thanks to the versatility of Microsoft’s platform, I only needed to upload a pair of managed DLLs.
Yes I know using an SQL backend is now a traditional approach to design a web application.
However, I see the following benefits in using the ESENT instead of e.g. MS SQL for the task:
- Every inter-process communication adds up extra latency to the system. ESENT works in the IIS process.
- Potentially, ESENT is capable of higher performance on the same hardware. I didn’t perform any measures myself, however here’s the data from the creator of the managed esent library.
- ESENT is OK holding up to 2GB of data in a single field of a single record. In some cases, this simplifies the system maintenance. For example, if MediaWiki was implemented with ESENT, it would be much easier to back up: currently, you need to backup SQL, and you need to backup the file storage holding attachments and pictures. While doing that, you should either put the server offline, or at least deny any modifications: failing to do that will lead to an inconsistent backup. Using ESENT would allow to store everything in the same DB, with the backup being a single atomic operation.
- No administration is required.
Below are some performance measurements for a simple application that inserts 1,000,000 records. Each record has an 8 byte autoincrement key and 32 bytes of data. These test results are from a fast development machine with an SSD disk.
|Insert records||132,000 records/second|
|Update one record||157,000 updates/second|
|Read one record||1,149,000 retrieves/second|
|Scan all records (sequential)||794,000 records/second|
|Seek to all records (random order)||266,000 seeks/second|
I’m afraid only the #4 is applicable to my web site. It has no latency requirements, due to the low traffic it needs no performance, and currently it has no CMS allowing users to upload an attachments or images (you can only upload your 100x100 pixels photo, which is limited to 64kb).
However, I think the task is adequate as a proof of concept.
Anyway, everything’s up and running now.