Re: emperor -> firefly 0.80.7 upgrade problem

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 3 Nov 2014 11:50:05 -0800

On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys <cwseys@xxxxxxxxxxxxxxxx> wrote:
> On Monday, November 03, 2014 13:22:47 you wrote:
>> Okay, assuming this is semi-predictable, can you start up one of the
>> OSDs that is going to fail with "debug osd = 20", "debug filestore =
>> 20", and "debug ms = 1" in the config file and then put the OSD log
>> somewhere accessible after it's crashed?
>
> Alas, I have not yet noticed a pattern.  Only thing I think is true is that
> they go down when I first make CRUSH changes.  Then after restarting, they run
> without going down again.
> All the OSDs are running at the moment.

Oh, interesting. What CRUSH changes exactly are you making that are
spawning errors?

> What I've been doing is marking OUT the OSDs on which a request is blocked,
> letting the PGs recover, (drain the OSD of PGs completely), then remove and
> readd the OSD.
>
> So far OSDs treated this way no longer have blocked requests.
>
> Also, seems as though that slowly decreases the number of incomplete and
> down+incomplete PGs .
>
>>
>> Can you also verify that all of your monitors are running firefly, and
>> then issue the command "ceph scrub" and report the output?
>
> Sure, should I wait until the current rebalancing is finished?

I don't think it should matter, although I confess I'm not sure how
much monitor load the scrubbing adds. (It's a monitor check; doesn't
hit the OSDs at all.)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com