On Monday, November 03, 2014 13:22:47 you wrote: > Okay, assuming this is semi-predictable, can you start up one of the > OSDs that is going to fail with "debug osd = 20", "debug filestore = > 20", and "debug ms = 1" in the config file and then put the OSD log > somewhere accessible after it's crashed? Alas, I have not yet noticed a pattern. Only thing I think is true is that they go down when I first make CRUSH changes. Then after restarting, they run without going down again. All the OSDs are running at the moment. What I've been doing is marking OUT the OSDs on which a request is blocked, letting the PGs recover, (drain the OSD of PGs completely), then remove and readd the OSD. So far OSDs treated this way no longer have blocked requests. Also, seems as though that slowly decreases the number of incomplete and down+incomplete PGs . > > Can you also verify that all of your monitors are running firefly, and > then issue the command "ceph scrub" and report the output? Sure, should I wait until the current rebalancing is finished? Thanks, Chad. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com