On Thu, 2011-03-03 at 13:03 -0700, Jim Schutt wrote: > > If none of that works, it's possible that someone is calling exit() > > somewhere. You can attach a gdb to the process and put a breakpoint on > > exit() to see if this is going on. There's a lot of "your foo is not > > bar enough, I hate your config, exit(1)" type code that gets executed > > while the daemon is starting up. It sounds like you should be past > > that point, though. > > I've finally gotten a little info, using a variant of > your gdb idea: I waited until many of the OSD instances > had died, then I attached gdb to several that were left, > and waited. > > Two of them died the same way, like this: > > Program received signal SIGPIPE, Broken pipe. > [Switching to Thread 0x7fd7888c8940 (LWP 28693)] > 0x00007fd7a9b82f2b in sendmsg () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00007fd7a9b82f2b in sendmsg () from /lib64/libpthread.so.0 > #1 0x0000000000672e0b in SimpleMessenger::Pipe::do_sendmsg ( > this=0x7fd799b67c20, sd=13, msg=0x7fd7888c7f20, len=251237, more=false) > at msg/SimpleMessenger.cc:1994 > #2 0x00000000006739d3 in SimpleMessenger::Pipe::write_message ( > this=0x7fd799b67c20, m=0x7fd79b2dcb70) at msg/SimpleMessenger.cc:2217 > #3 0x000000000067e74a in SimpleMessenger::Pipe::writer (this=0x7fd799b67c20) > at msg/SimpleMessenger.cc:1734 > #4 0x000000000066fa2b in SimpleMessenger::Pipe::Writer::entry ( > this=0x7fd799b67e70) at msg/SimpleMessenger.h:204 > #5 0x000000000068282e in Thread::_entry_func (arg=0x7fd799b67e70) > at ./common/Thread.h:41 > #6 0x00007fd7a9b7b73d in start_thread (arg=<value optimized out>) > at pthread_create.c:301 > #7 0x00007fd7a8a91f6d in clone () from /lib64/libc.so.6 > (gdb) > Has something maybe changed in signal handling recently? Maybe SIGPIPE used to be blocked, and sendmsg() would return -EPIPE, but now it's not blocked and not handled? This bit in linux-2.6.git/net/core/stream.c is what made me wonder, but maybe it's a red herring: int sk_stream_error(struct sock *sk, int flags, int err) { if (err == -EPIPE) err = sock_error(sk) ? : -EPIPE; if (err == -EPIPE && !(flags & MSG_NOSIGNAL)) send_sig(SIGPIPE, current, 0); return err; } -- Jim -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html