New(ish) LMDBstore KeyValuedB backend and testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi guys,

A month or so ago I started rewriting Xinxin Shu's old LMDB kvdb PR to work with master. I've now got it to the point where it no longer segfaults during performance tests as the kvstore for bluestore.

The new PR is here:

https://github.com/ceph/ceph/pull/16257

The original version needed some work to get into a functional state. Some of this was just general things like adding cmake support and adding functionality to conform to the current KeyValueDB interface. Beyond that, the biggest issue was reworking it to not keep multiple concurrent write transactions open. Other minor issues include removing various temporary string conversions and memory copies where possible.

One remaining issue is that Bluestore's BitmapFreelistManager keeps a read iterator open indefinitely. This translates to keeping a read-only lmdb txn open, which makes new writes to the database grow the database rather than reusing free-pages (ie the db grows without bound with new writes).

A (buggy) fix for this is being attempted in another PR (currently probably breaks freelist management, segfaults for large min_alloc sizes), but fixes the db growth issue:

https://github.com/ceph/ceph/pull/16243

Write performance is fairly low (as expected) vs rocksdb, though it's possible we may be able to improve some of that via a WAL or similar mechanism. Most of the time is spent doing fdatasync via mdb_env_sync. It might be possible to make the bluestore write workload more favorable to lmdb (bigger key/value pairs) but that's potentially a lot of work.

Space-amp so far seems like it might be lower than rocksdb in our current configuration though which could be good for large HDD configurations where the DB is on flash and raw write performance is less of an issue.

I think it would be good to try to get some other key/value stores like forestdb or even the aging zetascale into a functional state and do some more comparison testing.

Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux