Re: Fork and RDMA operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 11 Aug 2016, Vu Pham wrote:
> Hello all,
> 
> The background:
> We have tested scaling with xio messenger and faced multiple *unknown* 
> problems (hard to trace and reproduce). We recently find out that the 
> daemonize/fork support isn't full in ibverbs. It assumes that the parent 
> process will do the RDMA operations. Any child process try to do rdma 
> operations will experience various unexpected problems.
> 
> ceph-osd/ceph-mon/ceph-mds daemonize (fork) after creating messengers.
> Xio messenger will initialize accelio library and register RDMA memory 
> in the 1st call to XioMessenger constructor.
> This situation is very problematic where child process do rdma 
> operations as described above
> 
> http://www.rdmamojo.com/2012/05/24/ibv_fork_init
> http://www.spinics.net/lists/linux-rdma/msg03364.html
> I create this PR which forces to daemonize/fork before creating 
> messenger
> 
> https://github.com/ceph/ceph/pull/10600
> 
> I have tested this patch by bringing up a cluster with 4 nodes, 8 
> osds/node, two monitors and run I/Os (4K - 4M block size) from 4 fio 
> clients.
> 
> Is there any known problem to daemonize/fork before creating messenger?
> Could you help to review and provide feedback?

This more or less works.  The main issue is that we don't catch errors as 
early as we did and a daemon may appear to start and then immediately 
exit without printing an error.

There is a Preforker class in common that is meant to address this (it's 
used by ceph-fuse and ceph-mon already).  It does the fork early, when 
prefork() is called, and keeps stdout/stderr open for an interim period 
until you call preforker.exit() or .daemonize().  Any exit code gets 
passed back to the parent over a socket.  I'm guessing that the mon is 
already working fine since you're just moving the prefork.daemonize() line 
around (the actual fork happened way back at

	https://github.com/vuhuong/ceph-upstream/blob/5cd5546fb7ddb9ef69380476b0a80038ba74a405/src/ceph_mon.cc#L500

) and you just need to make ceph_osd.cc and ceph_mds.cc use Preforker in a 
similar way.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux