A first test with OpenLDAP and the MDB backend
I recently read about a new MDB backend
for OpenLDAP and when I found out they were doing memory-mapped I/O and that
it was all written in C I got pretty interested. Once upon a time (I'm an
old C hacker) I wrote some of my own very rudimentary hash indexed
DB stuff using memory mapped I/O and knew it to be very, very fast.
And when I saw some of the benchmarks that were done on different filesystems,
I was even more interested. This looks promising. A few years ago I banged
together a quick bit of java that watches my postfix logs from a bunch of
different relays and updates an LDAP back-end, and a few PHP scripts for
then searching for stuff in it. It makes it easy-peasy for junior admins
to trace what happened to an email - they can search by (for instance)
a to or from address, then in the resulting page click on the QueueID or
the message-ID of a given email and then see where it came from, what
servers it traversed and who all received it. That sorta thing.
Yeah, I know, LDAP is s'posed to be write-rarely, read-mostly and here I
am doing tons of writes/updates (anywhere from 5 to 10 per second).
So sue me. Part of the reason for deciding to use LDAP as the back-end was
because it gave me a good excuse to edjimicate myself on the Java LDAP API.
:-) It also, coincidentally, was an interesting exercise in how to make
OpenLDAP perform well even when there are lots of write/modify operations
going on - I traded off some consistency/reliability (in the case of
crashes or power outages) for speed but since the DB is based on logs
anyway, who cares - I can always rebuild from the logs if I had to. :-)
Anyway, today I compiled a newer OpenLDAP with MDB support and setup my
first server using the MDB back-end. Setup was agreeably simple. No
tweaking of DB_CONFIG options. I went looking for info on how to make
sure log files get cleaned up periodically (after a commit) only to find...
that's not an issue with MDB. Just setup your checkpoints to occur
occasionally and you're good ta go. Nice!
I've only done some rudimentary testing, so far, but it's looking good. One
hurdle I ran into was that my syslog system is only a 32 bit linux. And so
OpenLDAP's MDB backend won't let me pick a maximum DB size of anything bigger
than 4G. But my log DB has been as big as 95 gigs before (whoops), so that's
not gonna do. I'll have to fire it up on a 64 bit system. I did confirm
this afternoon that if compiled on a 64bit OS (where an unsigned long in C is
8 bytes, not 4), then it will let me specify a big DB size like, say, 200 gigs.
And from what I understand, it's important to pick a big DB size initially.
Note that this doesn't mean the DB will instantly be that big - just it's
the maximum size it'll ever be. The MDB code creates "holey" or "sparse"
files. What that means is that it's seeking to various offsets within
the file and writing data, and the gaps between those offsets may be empty,
unallocated space. Think of it this way... Say my hashing mechanism says
this first record I'm writing, which is only 1k, should be written to
an offset in the middle of a 200G file. The OS happily writes the 1k to
disk at that offset, but the file is now only 1k. What's confusing to
someone never dealing with sparse files is that if you do ls -l the file
appears to be 100g+1k big. In reality, it's only consuming 1k of disk.
Weird. So why care? Because if you cp the file, cp reads all that unallocated
space as zeros and writes a new file containing all those zeros - the new
file really will consume 100g+1k. Oops.
Some apps are smarter about sparse files and only copy/read the allocated
space. For instance, if you run xfsdump or dump, these only dump the allocated
space - 1k in this case. If memory serves, gnu tar is smart about dealing
with sparse files too, but last I checked the tar on most unixes (Solaris,
for instance) weren't. 'Course the last time I looked at the tar on Solaris
was something like 7 years ago too and it might be smarter now.
rsync can be smart about sparse files (see the --sparse option).
Anyway, I have high hopes for this new MDB back-end. I'll be testing it
with my postfix logs real soon now and probably with a DNS server using
LDAP for the back-end also. Good stuff!