Re: ceph-mon vs leveldb status

"Holger Hoffstaette" <holger.hoffstaette@xxxxxxxxxxxxxx> · Mon, 24 Jun 2013 20:57:29 +0200

On Mon, 24 Jun 2013 09:56:40 -0700, Sage Weil wrote:

> On Mon, 24 Jun 2013, Holger Hoffstaette wrote:
>> On Wed, 19 Jun 2013 13:09:42 -0700, Sage Weil wrote:
>> 
>> [snip]
>> 
>> > Meanwhile, the next development release will be changing the way all
>> > the pg metadata in the monitor is stored to be much more efficient and
>> > to take advantage of leveldb's capabilities; this will be present in
>> > 0.66 (dumpling - 1).
>> 
>> Have you considered using one of the recently annoucned LevelDB forks?
>> The HyperDex folks recently published their HyperLevelDB fork (still
>> compatible though) and it has significantly improved behaviour, less
>> performance variance etc.
>> See http://hyperdex.org/performance/leveldb/ or github.
> 
> The two issues are packaging and QA.  Ideally we (or someone) would build
> packages that provide libleveldb so that users can drop in whichever
> leveldb variant they want on their machines.  The other issue here,

As much as I understand, I think for something as critical as a metadata
store for ceph this way lies madness, and that you will sooner or later be
forced to fork/bundle/QA yourself anyway..maybe that's just me being old.
Trust me when I say that as a Gentoo user & developer I am painfully
familiar with all the issues around bundling/unbundling/upstreaming etc.,
not the least because of the absurd HN discussion last week, which was
about LevelDB as well.

As much as I hope this gets rolled back upstream I'm pessimistic
simply based on Google's track record of properly managing their open
source projects.

That being said, the HyperLevelDB fork is actively maintained and upstream
patches, as well as test cases, are merged. It also explicitly has a
different soname, precisely to avoid the confusion that could come from an
improved fork with the same name.

> though, is that these are new variants that haven't seen as much usage, so
> we have no idea how stable they are with Ceph workloads.

After looking at the LevelDB bugtracker I don't think things can really
get much worse.. :/

The real reason I posted this was that I've been lurking here and noticed
a lot of postings about timeouts, performance drops etc. The thing is that
HyperLevelDB should give back breathing room for the case where a system
is over ~80% utilization (the part of the hockey curve where things go
north in terms of latency). Improving bandwith, reducing contention and
thus latency & pause times etc. can have an incredibly stabilizing effect
on a system. Very often a lot of weird and hard to diagnose queueing
effects (convoying, unintentional synchronisation leading to stalls etc.)
can be traced to this. I'm not saying this is the case..just that it can
never hurt to have more predictable performance characteristics in the
metadata store for a distributed filesystem.

Also, who doesn't enjoy more efficient software? Think of the little ARM
cores.. :)

cheers
Holger

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html