Re: [EXTERNAL] Re: avoiding false detection of down OSDs

"Jim Schutt" <jaschut@xxxxxxxxxx> · Tue, 31 Jul 2012 13:58:58 -0600

On 07/31/2012 12:40 PM, Sage Weil wrote:
On Tue, 31 Jul 2012, Gregory Farnum wrote:
On Tue, Jul 31, 2012 at 8:07 AM, Jim Schutt<jaschut@xxxxxxxxxx>  wrote:

Also, FWIW I've been running my Ceph servers with no swap,
and I've recently doubled the size of my storage cluster.
Is it possible to have map processing do a little memory
accounting and log it, or to provide some way to learn
that map processing is chewing up significant amounts of
memory?  Or maybe there's already a way to learn this that
I need to learn about?  I sometimes run into something that
shares some characteristics with what you describe, but is
primarily triggered by high client write load.  I'd like
to be able to confirm or deny it's the same basic issue
you've described.

I think that we've done all our diagnosis using profiling tools, but
there's now a map cache and it probably wouldn't be too difficult to
have it dump data via perfcounters if you poked around...anything like
this exist yet, Sage?

Much of the bad behavior was triggered by #2860, fixes for which just went
into the stable and master branches yesterday.  It's difficult to fully
observe the bad behavior, though (lots of time spend in
generate_past_intervals, reading old maps off disk).  With the fix, we
pretty much only process maps during handle_osd_map.

Adding perfcounters in the methods that grab a map out of the cache or
(more importantly) read it off disk will give you better visibility into
that.  It should be pretty easy to instrument that (and I'll gladly
take patches that implement that... :).  Without knowing more about what
you're seeing, it's hard to say if its related, though.  This was
triggered by long periods of unclean pgs and lots of data migration, not
high load.

An issue I've been seeing is unusually high OSD memory use.
It seems to be triggered by linux clients timing out requests
and resetting OSDs during a heavy write load, but I was hoping
to rule out any memory-use issues caused by map processing.
However, this morning I started testing your server wip-msgr
branch together with the kernel-side patches queued up for 3.6,
and so far with that combination I've been unable to trigger the
behavior I was seeing.  So, that's great news, and I think
confirms that issue was unrelated to any map issues.

I've also sometimes had issues with my cluster becoming unstable
when failing an OSD while the cluster is under a heavy write load,
but hadn't been successful at characterizing under what conditions
it couldn't recover.  I expect that situation is now improved as
well, and will retest.

Thanks -- Jim

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html