Jim Schutt wrote:
Sage Weil wrote:
On Thu, 31 Mar 2011, Jim Schutt wrote:
I was actually suggesting we try to make it core dump inside the
"delete
this" and watching for a stall in progress and then sending SIGABRT
to dump
core in the act. That way we verify it really is in the allocator (and
maybe even see where). That's a bit harder to set up, though!
Right, I couldn't think of how to automate that stall detection
during the stall, rather than after. At least, I couldn't
think of how to do it without incurring possibly excessive
overhead, say by starting a timer on every "delete this".
Yeah. I wonder if dumping core on a cosd right when it gets marked
down would do the trick? That should catch it ~20 seconds or whatever
in the stall. By watching for the "osdfoo marked down" messages from
ceph -w?
What about making Cond::Wait() use pthread_cond_timedwait()
with a suitable timeout value, say 10 seconds, and asserting
on timeout? Do you think there would be many legitimate 10
second delays in OSD processing?
Or, I could make a Cond::WaitIntervalOrAbort(), and
use it just on the pipe lock, since that's the source
of the trouble. Sound useful?
-- Jim
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html