Re: [EXTERNAL] Re: avoiding false detection of down OSDs

Sage Weil <sage@xxxxxxxxxxx> · Tue, 31 Jul 2012 11:40:43 -0700 (PDT)

On Tue, 31 Jul 2012, Gregory Farnum wrote:
> On Tue, Jul 31, 2012 at 8:07 AM, Jim Schutt <jaschut@xxxxxxxxxx> wrote:
> > On 07/30/2012 06:24 PM, Gregory Farnum wrote:
> >> Hmm. The concern is that if an OSD is stuck on disk swapping then it's
> >> going to be just as stuck for the monitors as the OSDs ? they're all
> >> using the same network in the basic case, etc. We want to be able to
> >> make that guess before the OSD is able to answer such questions.
> >> But I'll think on if we could try something else similar.
> >
> >
> > OK - thanks.
> >
> > Also, FWIW I've been running my Ceph servers with no swap,
> > and I've recently doubled the size of my storage cluster.
> > Is it possible to have map processing do a little memory
> > accounting and log it, or to provide some way to learn
> > that map processing is chewing up significant amounts of
> > memory?  Or maybe there's already a way to learn this that
> > I need to learn about?  I sometimes run into something that
> > shares some characteristics with what you describe, but is
> > primarily triggered by high client write load.  I'd like
> > to be able to confirm or deny it's the same basic issue
> > you've described.
> 
> I think that we've done all our diagnosis using profiling tools, but
> there's now a map cache and it probably wouldn't be too difficult to
> have it dump data via perfcounters if you poked around...anything like
> this exist yet, Sage?

Much of the bad behavior was triggered by #2860, fixes for which just went 
into the stable and master branches yesterday.  It's difficult to fully 
observe the bad behavior, though (lots of time spend in 
generate_past_intervals, reading old maps off disk).  With the fix, we 
pretty much only process maps during handle_osd_map.

Adding perfcounters in the methods that grab a map out of the cache or 
(more importantly) read it off disk will give you better visibility into 
that.  It should be pretty easy to instrument that (and I'll gladly 
take patches that implement that... :).  Without knowing more about what 
you're seeing, it's hard to say if its related, though.  This was 
triggered by long periods of unclean pgs and lots of data migration, not 
high load.

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html