On Thu, 31 Mar 2011, Jim Schutt wrote: > > I was actually suggesting we try to make it core dump inside the "delete > > this" and watching for a stall in progress and then sending SIGABRT to dump > > core in the act. That way we verify it really is in the allocator (and > > maybe even see where). That's a bit harder to set up, though! > > Right, I couldn't think of how to automate that stall detection > during the stall, rather than after. At least, I couldn't > think of how to do it without incurring possibly excessive > overhead, say by starting a timer on every "delete this". Yeah. I wonder if dumping core on a cosd right when it gets marked down would do the trick? That should catch it ~20 seconds or whatever in the stall. By watching for the "osdfoo marked down" messages from ceph -w? > > Dumping right after may still yield some useful info, but I'm less > > hopeful... > > I thought I might try turning off all debugging, except a notice > that the "delete this" took too long. This is easy to do, and > would tell us if allocator activity in support of debugging is > affecting operations. It doesn't lead to any ideas for > improving the situation, though :/ > > Also, since I built tcmalloc from source, I thought I might > try to figure out what operation is taking too long there. > I'm hoping Ceph logging redirection is set up so that stdout > or stderr from tcmalloc would show up in my log files? Not with the default logging stuff. However, you can run the daemons with '-d' and they will stay in the foreground and log to stderr. Or -f will send the ceph logs to their usual locations, but the daemon won't fork and you can redirect stdout/stderr (with any tcmalloc stuff) wherever you like. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html