On Fri, 21 Apr 2017, Fabian Grünbichler wrote: > On Fri, Apr 21, 2017 at 01:16:23PM +0000, Sage Weil wrote: > > On Fri, 21 Apr 2017, Fabian Grünbichler wrote: > > > On Thu, Apr 20, 2017 at 08:11:38PM +0200, Nathan Cutler wrote: > > > > Hi Willem: > > > > > > > > It sounds like you are trying to use the sysvinit scripts? These have been > > > > unmaintained (presumably with lots of weeds growing up) since infernalis. > > > > Until now I have been assuming that all init systems other than systemd > > > > (sysvinit, upstart, etc.) are deprecated in Ceph. > > > > > > Slightly OT, but AFAICT http://tracker.ceph.com/issues/18305 still > > > applies to the official ceph.com Kraken Debian packages, i.e., ceph-base > > > installs and activates the init.d script, which then races against the > > > (udev-activated) ceph-osd systemd units. If anything except systemd is > > > indeed deprecated, I wonder why the Debian packages (still) ship AND > > > activate both systemd units and Sys V init scripts? > > > > > > (Note that the proposed fix probably does not apply as is anymore, > > > because ceph-disk and the systemd units have been changed in the > > > meantime). > > > > I think the issue with Debian (generally) is that it "supports" multiple > > init systems (sysvinit and systemd both), even though systemd is the one > > installed default. Which means we ship the sysvinit script and systemd > > unit files. > > > > (There may very well be a bug in how we "activate" them, though!) > > > > that's why I reported the original issue - in general you never want to > have both a multi-daemon init script (like the "old" ceph one) and the > replacing split up systemd units active at the same time. > > since systemd will generate a unit for every init script for which a > unit of the same name does not already exist, you either need to mask > the auto-generated unit (i.e., symlink it to /dev/null) or write a > replacement unit that has the identical name (so the "ceph" init script > becomes the "ceph.service" unit). if you don't do that and your units > are named differently than your init script, both will be active (this > is not a Debianism, it is how the LSB generator in systemd is supposed > to work to ease the transition from Sys V init to systemd..). > > what exacerbates the issue in this case is that the systemd units + udev > actually don't completely replace the old init scripts, because some of > the udev events might have been processed before the system was fully > booted, and osds might not be properly activated on boot as a result. > > hence my proposal to add a ceph.service that simply calls "ceph-disk > activate-all", which is AFAICT the only part of the init script that is > not covered by the current systemd units / udev rules. This sounds reasonable to me. It could also do nothing... IIRC the ceph-disk activate-all was a workaround for racy/buggy udev interactions that preventing all osds from starting on large boxes with lots of disks (see below). Given that we don't have such a workaround on systemd anymore I'm not sure if it's still necessary or not. (I guess it can't hurt, though!) > the whole ceph service startup is pretty messy in general IMHO, > especially for OSDs where (IIRC?) udev rules are calling python scripts > which are starting systemd units which are in turn calling python > scripts with different parameters that end up starting systemd units > which actually start daemons (somewhere in there the mounting happens as > well..). Agreed. The goal with using udev like this was to make it all hotplug, but I'm not sure if any operators actually take advantage of this. If they don't, we could consider going back to something a bit less weird... sage