Re: [EXTERNAL] Re: avoiding false detection of down OSDs

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 31 Jul 2012 11:14:55 -0700



On Tue, Jul 31, 2012 at 8:07 AM, Jim Schutt <jaschut@xxxxxxxxxx> wrote:
> On 07/30/2012 06:24 PM, Gregory Farnum wrote:
>> Hmm. The concern is that if an OSD is stuck on disk swapping then it's
>> going to be just as stuck for the monitors as the OSDs — they're all
>> using the same network in the basic case, etc. We want to be able to
>> make that guess before the OSD is able to answer such questions.
>> But I'll think on if we could try something else similar.
>
>
> OK - thanks.
>
> Also, FWIW I've been running my Ceph servers with no swap,
> and I've recently doubled the size of my storage cluster.
> Is it possible to have map processing do a little memory
> accounting and log it, or to provide some way to learn
> that map processing is chewing up significant amounts of
> memory?  Or maybe there's already a way to learn this that
> I need to learn about?  I sometimes run into something that
> shares some characteristics with what you describe, but is
> primarily triggered by high client write load.  I'd like
> to be able to confirm or deny it's the same basic issue
> you've described.

I think that we've done all our diagnosis using profiling tools, but
there's now a map cache and it probably wouldn't be too difficult to
have it dump data via perfcounters if you poked around...anything like
this exist yet, Sage?
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html