Could boost.process be an alternative here? On Fri, Aug 26, 2016 at 1:16 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: > > > On Thu, Aug 25, 2016 at 10:43 PM, Casey Bodley <cbodley@xxxxxxxxxx> wrote: >> >> >> On 08/24/2016 05:32 PM, Sage Weil wrote: >>> >>> On Wed, 24 Aug 2016, Casey Bodley wrote: >>>> >>>> On 08/12/2016 12:27 PM, Sage Weil wrote: >>>>> >>>>> It seems like it would be simpler to push the fork before any important >>>>> operations. (And BTW with systemd and upstart we don't fork anyway; >>>>> it's >>>>> just there for sysvinit.) The preforker thing is there to make it easy >>>>> to >>>>> fork early, but keep the parent waiting around so that you can do more >>>>> intialization, print errors, and terminate with an error code if >>>>> something >>>>> (post-fork) goes wrong. In theory, there's no reason why we couldn't >>>>> make >>>>> this almost the very first thing the daemon does so that *all* work is >>>>> done in the child... >>>>> >>>>> sage >>>>> >>>> I've recently run into issues related to fork as well (see my "memory >>>> leaks >>>> related to CephContext and global_init_daemonize()" email). Trying to >>>> manage >>>> resources across a fork is difficult and error-prone, so changing how we >>>> daemonize could eliminate a whole class of these bugs. And as Haomai >>>> points >>>> out, we're going to great lengths to make things work in the current >>>> model. >>>> >>>> I'm a big fan of Sage's theory that all work could be done in the child >>>> process, and I'm willing to take on the project if we can reach a >>>> consensus on >>>> the design. >>>> >>>> Whether or not we're interested in long-term support for SysV, the >>>> ability to >>>> daemonize is useful our for development workflow (vstart.sh in >>>> particular). To >>>> fill this role, some basic requirements for the parent process are: >>>> * don't exit until initialization is finished >>>> * return an error code if initialization failed >>>> >>>> As Sage pointed out, this is exactly what Preforker is doing for >>>> ceph-mon. So >>>> we can start by changing the other daemons to use that instead of >>>> global_init_daemonize(). >>>> >>>> The next step is to prevent global_init()/common_init()/CephContext from >>>> doing >>>> any work in the parent process (esp. spawning threads for Log, >>>> CephContextServiceThread, and AdminSocket). Decoupling the config parsing >>>> from >>>> CephContext initialization seems like a natural way to accomplish that. >>>> So the >>>> parent would create and initialize the md_config_t object, then after >>>> fork, >>>> the child would pass that as an argument when creating the CephContext. >>>> >>>> How does that sound for a start? >>> >>> Sounds good to me! The last step (decoupling) sounds like it's not >>> strictly necessary but is probably worthwhile. It will get deep into a >>> bunch of crufty code that isn't much fun to deal with, so I suspect you'll >>> either run away screaming or come up with something pretty satisfying that >>> removes a bunch of ugly code. >>> >>> sage >> >> >> Thanks Sage. I'll have to investigate some more to see how messy it will be, >> but I'm more worried about trying to construct a 'partial' CephContext for >> parsing, and exposing that to the application when most of its capabilities >> (logging, especially) are missing. The parent process won't be able to call >> any code with dout()s, so I think it's safest to avoid constructing a >> CephContext in the first place. > > I'd like to add some thoughts here if I may? > > Recently Kefu and I were discussing uses of fork in the ceph codebase at length > due to the issue seen in http://tracker.ceph.com/issues/16556 Over recent years > supporting multiple products, including glibc, I have seen many problems arising > from what I consider to be misuse of fork. > > There is a saying, "The only safe function call to make after fork() is exec()" > and this is reflected in the fork man page. > > "After a fork(2) in a multithreaded program, the child can safely call only > async-signal-safe functions (see signal(7)) until such time as it calls > execve(2)." > > The list of async-signal-safe functions is not very extensive and does not > include things like malloc and free and I have seen on multiple occasions > deadlocks where the child is trying to acquire a malloc arena lock for example > that was locked when fork was called and hanging forever. Basically, if a lock > is locked when you fork the child will get a copy (COW) of the lock in the > locked state with no possibility of the lock being unlocked since the child is > single threaded. if the child then tries to acquire this lock it will hang > forever. The same can happen with other resources. Note that even if your code > is single threaded some libraries may use threads in the background and this can > also catch you out. Sometimes these bugs can be hideously difficult to pin down > and happen extremely rarely and only on certain machines, etc. due to timing. > > These types of problems with fork are well documented. > > https://bugs.chromium.org/p/chromium/issues/detail?id=35374#c51 > http://www.evanjones.ca/fork-is-dangerous.html > http://thorstenball.com/blog/2014/10/13/why-threads-cant-fork/ > https://github.com/gperftools/gperftools/issues/178 (tcmalloc and its use of > locks) > > Google shows a preference for clone in this document, posix_spawn is another > alternative. > > https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md#death-tests-and-threads > > There are partial solutions like pthread_atfork but it is not a comprehensive > solution, nor is it without its own problems. > > Anyway, I just wanted to get this information out there for discussion so we can > make an educated decision going forward and so that everyone is aware of the > potential issues with uses of fork in ceph. > > -- > Cheers, > Brad -- Cheers, Brad -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html