On Tue, Jul 30, 2013 at 3:54 PM, Alex Elsayed <eternaleye@xxxxxxxxx> wrote: > I posted this as a comment on the blueprint, but I figured I'd say it here: > > The thing I'd worry about here is that LevelDB's performance (along with > that of various other K/V stores) falls off a cliff for large values. > > Symas (who make LMDB, used by OpenLDAP) did some benchmarking that shows > drastic performance loss with 100KB values on both read and write: > http://symas.com/mdb/microbench/#sec4 > > It's not just disk latency, either - an SSD showed the same behavior: > http://symas.com/mdb/microbench/#sec7 > > I'd recommend REALLY careful benchmarking with a variety of loads (and value > sizes). There are various users of leveldb who have tuned it more for workloads like this; Ryak has some stuff (not sure how much) and I believe HyperDex has some code changes that do a bunch but include better support for large writes. One thing to keep in mind is that we do already have leveldb in the OSD; it uses that for "omap" and keeping track of a lot of object metadata and lookaside stuff. I've asked before about using leveldb as a backing store and the big trouble with it is that it assumes it's feasible to copy the values it stores several times; with 4MB objects it really isn't. That doesn't mean it can't be appropriate for other kinds of workloads, though, and there are several interface layers for providing a backing store that could make this pluggable. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html