On Thu, 3 Mar 2011, Jim Schutt wrote: > On Thu, 2011-03-03 at 11:04 -0700, Sage Weil wrote: > > On Thu, 3 Mar 2011, Jim Schutt wrote: > > > > > > On Wed, 2011-03-02 at 22:03 -0700, Sage Weil wrote: > > > > > I'm not sure how to track down what's happening here... > > > > > > > > Hmm. I'm not able to reproduce this here (tho I only have ~15 nodes > > > > available at the moment). Seeing the last bit of the logs on the crashed > > > > nodes will help. > > > > > > > > Can you confirm that the chdir is working now? Maybe put an assert(0) in > > tick() so we can verify core dumps are working in general? > > Great idea, and chdir is definitely working; got 96 core > files as expected. Can you put an assert(0) at the top of OSD::shutdown() so we can verify that the OSD isn't trying to shut itself down cleanly? (There are a few cases where it might do that.) The logs you had make it look a bit like that could be the case. Or that it is crashing in an unpleasant way in the messenger pipe teardown. Thanks! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html