A first test with OpenLDAP and the MDB backend

I recently read about a new MDB backend for OpenLDAP and when I found out they were doing memory-mapped I/O and that it was all written in C I got pretty interested. Once upon a time (I'm an old C hacker) I wrote some of my own very rudimentary hash indexed DB stuff using memory mapped I/O and knew it to be very, very fast.

And when I saw some of the benchmarks that were done on different filesystems, I was even more interested. This looks promising. A few years ago I banged together a quick bit of java that watches my postfix logs from a bunch of different relays and updates an LDAP back-end, and a few PHP scripts for then searching for stuff in it. It makes it easy-peasy for junior admins to trace what happened to an email - they can search by (for instance) a to or from address, then in the resulting page click on the QueueID or the message-ID of a given email and then see where it came from, what servers it traversed and who all received it. That sorta thing.

Yeah, I know, LDAP is s'posed to be write-rarely, read-mostly and here I am doing tons of writes/updates (anywhere from 5 to 10 per second). So sue me. Part of the reason for deciding to use LDAP as the back-end was because it gave me a good excuse to edjimicate myself on the Java LDAP API. :-) It also, coincidentally, was an interesting exercise in how to make OpenLDAP perform well even when there are lots of write/modify operations going on - I traded off some consistency/reliability (in the case of crashes or power outages) for speed but since the DB is based on logs anyway, who cares - I can always rebuild from the logs if I had to. :-)

Anyway, today I compiled a newer OpenLDAP with MDB support and setup my first server using the MDB back-end. Setup was agreeably simple. No tweaking of DB_CONFIG options. I went looking for info on how to make sure log files get cleaned up periodically (after a commit) only to find... that's not an issue with MDB. Just setup your checkpoints to occur occasionally and you're good ta go. Nice!

I've only done some rudimentary testing, so far, but it's looking good. One hurdle I ran into was that my syslog system is only a 32 bit linux. And so OpenLDAP's MDB backend won't let me pick a maximum DB size of anything bigger than 4G. But my log DB has been as big as 95 gigs before (whoops), so that's not gonna do. I'll have to fire it up on a 64 bit system. I did confirm this afternoon that if compiled on a 64bit OS (where an unsigned long in C is 8 bytes, not 4), then it will let me specify a big DB size like, say, 200 gigs.

And from what I understand, it's important to pick a big DB size initially. Note that this doesn't mean the DB will instantly be that big - just it's the maximum size it'll ever be. The MDB code creates "holey" or "sparse" files. What that means is that it's seeking to various offsets within the file and writing data, and the gaps between those offsets may be empty, unallocated space. Think of it this way... Say my hashing mechanism says this first record I'm writing, which is only 1k, should be written to an offset in the middle of a 200G file. The OS happily writes the 1k to disk at that offset, but the file is now only 1k. What's confusing to someone never dealing with sparse files is that if you do ls -l the file appears to be 100g+1k big. In reality, it's only consuming 1k of disk. Weird. So why care? Because if you cp the file, cp reads all that unallocated space as zeros and writes a new file containing all those zeros - the new file really will consume 100g+1k. Oops.

Some apps are smarter about sparse files and only copy/read the allocated space. For instance, if you run xfsdump or dump, these only dump the allocated space - 1k in this case. If memory serves, gnu tar is smart about dealing with sparse files too, but last I checked the tar on most unixes (Solaris, for instance) weren't. 'Course the last time I looked at the tar on Solaris was something like 7 years ago too and it might be smarter now.

rsync can be smart about sparse files (see the --sparse option).

Anyway, I have high hopes for this new MDB back-end. I'll be testing it with my postfix logs real soon now and probably with a DNS server using LDAP for the back-end also. Good stuff!