Re: Fork and RDMA operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 08/24/2016 05:32 PM, Sage Weil wrote:
On Wed, 24 Aug 2016, Casey Bodley wrote:
On 08/12/2016 12:27 PM, Sage Weil wrote:
It seems like it would be simpler to push the fork before any important
operations.  (And BTW with systemd and upstart we don't fork anyway; it's
just there for sysvinit.)  The preforker thing is there to make it easy to
fork early, but keep the parent waiting around so that you can do more
intialization, print errors, and terminate with an error code if something
(post-fork) goes wrong.  In theory, there's no reason why we couldn't make
this almost the very first thing the daemon does so that *all* work is
done in the child...

sage

I've recently run into issues related to fork as well (see my "memory leaks
related to CephContext and global_init_daemonize()" email). Trying to manage
resources across a fork is difficult and error-prone, so changing how we
daemonize could eliminate a whole class of these bugs. And as Haomai points
out, we're going to great lengths to make things work in the current model.

I'm a big fan of Sage's theory that all work could be done in the child
process, and I'm willing to take on the project if we can reach a consensus on
the design.

Whether or not we're interested in long-term support for SysV, the ability to
daemonize is useful our for development workflow (vstart.sh in particular). To
fill this role, some basic requirements for the parent process are:
* don't exit until initialization is finished
* return an error code if initialization failed

As Sage pointed out, this is exactly what Preforker is doing for ceph-mon. So
we can start by changing the other daemons to use that instead of
global_init_daemonize().

The next step is to prevent global_init()/common_init()/CephContext from doing
any work in the parent process (esp. spawning threads for Log,
CephContextServiceThread, and AdminSocket). Decoupling the config parsing from
CephContext initialization seems like a natural way to accomplish that. So the
parent would create and initialize the md_config_t object, then after fork,
the child would pass that as an argument when creating the CephContext.

How does that sound for a start?
Sounds good to me!  The last step (decoupling) sounds like it's not
strictly necessary but is probably worthwhile.  It will get deep into a
bunch of crufty code that isn't much fun to deal with, so I suspect you'll
either run away screaming or come up with something pretty satisfying that
removes a bunch of ugly code.

sage

Thanks Sage. I'll have to investigate some more to see how messy it will be, but I'm more worried about trying to construct a 'partial' CephContext for parsing, and exposing that to the application when most of its capabilities (logging, especially) are missing. The parent process won't be able to call any code with dout()s, so I think it's safest to avoid constructing a CephContext in the first place.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux