ceph-volume and systemd troubles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear ceph users,

I've been experimenting setting up a new node with ceph-volume and bluestore.  Most of the setup works right, but I'm running into a strange interaction between ceph-volume and systemd when starting OSDs.

After preparing/activating the OSD, a systemd unit instance is created with a symlink in /etc/systemd/system/multi-user.target.wants
    ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb.service -> /usr/lib/systemd/system/ceph-volume@.service

I've moved this dependency to ceph-osd.target.wants, since I'd like to be able to start/stop all OSDs on the same node with one command (let me know if there is a better way).  The stopping works without this, since ceph-osd@.service is marked as part of ceph-osd.target, but starting does not since these new ceph-volume units aren't together in a separate target.

However, when I run 'systemctl start ceph-osd.target' multiple times, the systemctl command hangs, even though the OSD starts up fine.  Interestingly, 'systemctl start ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb.service' does not hang, however.
Troubleshooting further, I see that the ceph-volume@.target unit calls 'ceph-volume lvm trigger 121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb', which in turn calls 'Activate', running a few systemd commands:

Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/H900D00/H900D00 --path /var/lib/ceph/osd/ceph-121
Running command: ln -snf /dev/H900D00/H900D00 /var/lib/ceph/osd/ceph-121/block
Running command: chown -R ceph:ceph /dev/dm-0
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-121
Running command: systemctl enable ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb
Running command: systemctl start ceph-osd@121
--> ceph-volume lvm activate successful for osd ID: 121

The problem seems to be the 'systemctl enable' command, which essentially tries to enable the unit that is currently being executed (for the case when running systemctl start ceph-osd.target).  Somehow systemd (in CentOS) isn't very happy with that.  If I edit the python scripts to check that the unit is not enabled before enabling it - the hangs stop.
For example, replacing in /usr/lib/python2.7/site-packages/ceph_volume/systemd/systemd.py

def enable(unit):
    process.run(['systemctl', 'enable', unit])

with

def enable(unit):
    stdout, stderr, retcode = process.call(['systemctl', 'is-enabled', unit], show_command=True)
    if retcode != 0:
        process.run(['systemctl', 'enable', unit])

fixes the issue.

Has anyone run into this, or has any ideas on how to proceed?

Andras

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux