On Wed, May 16, 2018 at 4:50 PM, Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote: > Dear ceph users, > > I've been experimenting setting up a new node with ceph-volume and > bluestore. Most of the setup works right, but I'm running into a strange > interaction between ceph-volume and systemd when starting OSDs. > > After preparing/activating the OSD, a systemd unit instance is created with > a symlink in /etc/systemd/system/multi-user.target.wants > ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb.service -> > /usr/lib/systemd/system/ceph-volume@.service > > I've moved this dependency to ceph-osd.target.wants, since I'd like to be > able to start/stop all OSDs on the same node with one command (let me know > if there is a better way). The stopping works without this, since > ceph-osd@.service is marked as part of ceph-osd.target, but starting does > not since these new ceph-volume units aren't together in a separate target. > > However, when I run 'systemctl start ceph-osd.target' multiple times, the > systemctl command hangs, even though the OSD starts up fine. Interestingly, > 'systemctl start > ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb.service' does not > hang, however. > Troubleshooting further, I see that the ceph-volume@.target unit calls > 'ceph-volume lvm trigger 121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb', which in > turn calls 'Activate', running a few systemd commands: > > Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev > /dev/H900D00/H900D00 --path /var/lib/ceph/osd/ceph-121 > Running command: ln -snf /dev/H900D00/H900D00 > /var/lib/ceph/osd/ceph-121/block > Running command: chown -R ceph:ceph /dev/dm-0 > Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-121 > Running command: systemctl enable > ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb > Running command: systemctl start ceph-osd@121 > --> ceph-volume lvm activate successful for osd ID: 121 > > The problem seems to be the 'systemctl enable' command, which essentially > tries to enable the unit that is currently being executed (for the case when > running systemctl start ceph-osd.target). Somehow systemd (in CentOS) isn't > very happy with that. If I edit the python scripts to check that the unit > is not enabled before enabling it - the hangs stop. > For example, replacing in > /usr/lib/python2.7/site-packages/ceph_volume/systemd/systemd.py > > def enable(unit): > process.run(['systemctl', 'enable', unit]) > > > with > > def enable(unit): > stdout, stderr, retcode = process.call(['systemctl', 'is-enabled', > unit], show_command=True) > if retcode != 0: > process.run(['systemctl', 'enable', unit]) > > > fixes the issue. > > Has anyone run into this, or has any ideas on how to proceed? This looks like an oversight on our end. We don't run into this because we haven't tried to start/stop all OSDs at once in our tests. Can you create a ticket so that we can fix this? Your changes look correct to me. http://tracker.ceph.com/projects/ceph-volume/issues/new > > Andras > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com