Dear ceph users, I've been experimenting setting up a new node with ceph-volume and bluestore. Most of the setup works right, but I'm running into a strange interaction between ceph-volume and systemd when starting OSDs. After preparing/activating the OSD, a systemd unit instance is created with a symlink in /etc/systemd/system/multi-user.target.wants ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb.service -> /usr/lib/systemd/system/ceph-volume@.service I've moved this dependency to ceph-osd.target.wants, since I'd like to be able to start/stop all OSDs on the same node with one command (let me know if there is a better way). The stopping works without this, since ceph-osd@.service is marked as part of ceph-osd.target, but starting does not since these new ceph-volume units aren't together in a separate target. However, when I run 'systemctl start ceph-osd.target' multiple times, the systemctl command hangs, even though the OSD starts up fine. Interestingly, 'systemctl start ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb.service' does not hang, however. Troubleshooting further, I see that the ceph-volume@.target unit calls 'ceph-volume lvm trigger 121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb', which in turn calls 'Activate', running a few systemd commands: Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/H900D00/H900D00 --path /var/lib/ceph/osd/ceph-121 Running command: ln -snf /dev/H900D00/H900D00 /var/lib/ceph/osd/ceph-121/block Running command: chown -R ceph:ceph /dev/dm-0 Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-121 Running command: systemctl enable ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb Running command: systemctl start ceph-osd@121 --> ceph-volume lvm activate successful for osd ID: 121 The problem seems to be the 'systemctl enable' command, which essentially tries to enable the unit that is currently being executed (for the case when running systemctl start ceph-osd.target). Somehow systemd (in CentOS) isn't very happy with that. If I edit the python scripts to check that the unit is not enabled before enabling it - the hangs stop. For example, replacing in /usr/lib/python2.7/site-packages/ceph_volume/systemd/systemd.py def enable(unit): with def enable(unit): fixes the issue. Has anyone run into this, or has any ideas on how to proceed? Andras |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com