Hi Vu, Actually you and me go into the same loop, in my async backend pr(https://github.com/ceph/ceph/pull/10264) the commit(https://github.com/ceph/ceph/pull/10264/commits/7055bafbcbf06425e71808cb2b089d1d04706728) defines a interface for prefork/postfork things. It's much like global_init_prefork_start/global_init_postfork_start but it's a generic interface. Refer to Kefu's comment why we need this: ============= note to myself, w.r.t. the before/after daemonize hook 1. it's a natural way to do bind/rebind in the event thread 2. we do bind before daemon(2) now 3. the child process after daemon(2) is a single threaded process, and all event threads are terminated, so no threads is taking care of bind/rebind after daemon(2), that's why we need to re-spawn the threads after daemon(2). ============= So let's resolve alike problem like this On Fri, Aug 12, 2016 at 9:22 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Thu, 11 Aug 2016, Vu Pham wrote: >> Hello all, >> >> The background: >> We have tested scaling with xio messenger and faced multiple *unknown* >> problems (hard to trace and reproduce). We recently find out that the >> daemonize/fork support isn't full in ibverbs. It assumes that the parent >> process will do the RDMA operations. Any child process try to do rdma >> operations will experience various unexpected problems. >> >> ceph-osd/ceph-mon/ceph-mds daemonize (fork) after creating messengers. >> Xio messenger will initialize accelio library and register RDMA memory >> in the 1st call to XioMessenger constructor. >> This situation is very problematic where child process do rdma >> operations as described above >> >> http://www.rdmamojo.com/2012/05/24/ibv_fork_init >> http://www.spinics.net/lists/linux-rdma/msg03364.html >> I create this PR which forces to daemonize/fork before creating >> messenger >> >> https://github.com/ceph/ceph/pull/10600 >> >> I have tested this patch by bringing up a cluster with 4 nodes, 8 >> osds/node, two monitors and run I/Os (4K - 4M block size) from 4 fio >> clients. >> >> Is there any known problem to daemonize/fork before creating messenger? >> Could you help to review and provide feedback? > > This more or less works. The main issue is that we don't catch errors as > early as we did and a daemon may appear to start and then immediately > exit without printing an error. > > There is a Preforker class in common that is meant to address this (it's > used by ceph-fuse and ceph-mon already). It does the fork early, when > prefork() is called, and keeps stdout/stderr open for an interim period > until you call preforker.exit() or .daemonize(). Any exit code gets > passed back to the parent over a socket. I'm guessing that the mon is > already working fine since you're just moving the prefork.daemonize() line > around (the actual fork happened way back at > > https://github.com/vuhuong/ceph-upstream/blob/5cd5546fb7ddb9ef69380476b0a80038ba74a405/src/ceph_mon.cc#L500 > > ) and you just need to make ceph_osd.cc and ceph_mds.cc use Preforker in a > similar way. > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html