On Mon, Nov 3, 2014 at 12:28 PM, Chad Seys <cwseys@xxxxxxxxxxxxxxxx> wrote: > On Monday, November 03, 2014 13:50:05 you wrote: >> On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys <cwseys@xxxxxxxxxxxxxxxx> wrote: >> > On Monday, November 03, 2014 13:22:47 you wrote: >> >> Okay, assuming this is semi-predictable, can you start up one of the >> >> OSDs that is going to fail with "debug osd = 20", "debug filestore = >> >> 20", and "debug ms = 1" in the config file and then put the OSD log >> >> somewhere accessible after it's crashed? >> > >> > Alas, I have not yet noticed a pattern. Only thing I think is true is >> > that they go down when I first make CRUSH changes. Then after >> > restarting, they run without going down again. >> > All the OSDs are running at the moment. >> >> Oh, interesting. What CRUSH changes exactly are you making that are >> spawning errors? > > Maybe I miswrote: I've been marking OUT OSDs with blocked requests. Then if > a OSD becomes too_full I use 'ceph osd reweight' to squeeze blocks off of the > too_full OSD. (Maybe that is not technically a CRUSH map change?) No, it is a change, I just want to make sure I understand the scenario. So you're reducing CRUSH weights on full OSDs, and then *other* OSDs are crashing on these bad state machine events? > > >> I don't think it should matter, although I confess I'm not sure how >> much monitor load the scrubbing adds. (It's a monitor check; doesn't >> hit the OSDs at all.) > > $ ceph scrub > No output. Oh, yeah, I think that output goes to the central log at a later time. (Will show up in ceph -w if you're watching, or can be accessed from the monitor nodes; in their data directory I think?) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com