Re: emperor -> firefly 0.80.7 upgrade problem

Chad Seys <cwseys@xxxxxxxxxxxxxxxx> · Mon, 3 Nov 2014 13:41:38 -0600

On Monday, November 03, 2014 13:22:47 you wrote:
> Okay, assuming this is semi-predictable, can you start up one of the
> OSDs that is going to fail with "debug osd = 20", "debug filestore =
> 20", and "debug ms = 1" in the config file and then put the OSD log
> somewhere accessible after it's crashed?

Alas, I have not yet noticed a pattern.  Only thing I think is true is that 
they go down when I first make CRUSH changes.  Then after restarting, they run 
without going down again.
All the OSDs are running at the moment.

What I've been doing is marking OUT the OSDs on which a request is blocked, 
letting the PGs recover, (drain the OSD of PGs completely), then remove and 
readd the OSD.

So far OSDs treated this way no longer have blocked requests.

Also, seems as though that slowly decreases the number of incomplete and 
down+incomplete PGs .

> 
> Can you also verify that all of your monitors are running firefly, and
> then issue the command "ceph scrub" and report the output?

Sure, should I wait until the current rebalancing is finished?

Thanks,
Chad.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com