Done: tracker #24152
Thanks,
Andras
On 05/16/2018 04:58 PM, Alfredo Deza wrote:
On Wed, May 16, 2018 at 4:50 PM, Andras Pataki
<apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
Dear ceph users,
I've been experimenting setting up a new node with ceph-volume and
bluestore. Most of the setup works right, but I'm running into a strange
interaction between ceph-volume and systemd when starting OSDs.
After preparing/activating the OSD, a systemd unit instance is created with
a symlink in /etc/systemd/system/multi-user.target.wants
ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb.service ->
/usr/lib/systemd/system/ceph-volume@.service
I've moved this dependency to ceph-osd.target.wants, since I'd like to be
able to start/stop all OSDs on the same node with one command (let me know
if there is a better way). The stopping works without this, since
ceph-osd@.service is marked as part of ceph-osd.target, but starting does
not since these new ceph-volume units aren't together in a separate target.
However, when I run 'systemctl start ceph-osd.target' multiple times, the
systemctl command hangs, even though the OSD starts up fine. Interestingly,
'systemctl start
ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb.service' does not
hang, however.
Troubleshooting further, I see that the ceph-volume@.target unit calls
'ceph-volume lvm trigger 121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb', which in
turn calls 'Activate', running a few systemd commands:
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev
/dev/H900D00/H900D00 --path /var/lib/ceph/osd/ceph-121
Running command: ln -snf /dev/H900D00/H900D00
/var/lib/ceph/osd/ceph-121/block
Running command: chown -R ceph:ceph /dev/dm-0
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-121
Running command: systemctl enable
ceph-volume@lvm-121-7a9aceb3-ac01-4c2e-97f7-94954004e2fb
Running command: systemctl start ceph-osd@121
--> ceph-volume lvm activate successful for osd ID: 121
The problem seems to be the 'systemctl enable' command, which essentially
tries to enable the unit that is currently being executed (for the case when
running systemctl start ceph-osd.target). Somehow systemd (in CentOS) isn't
very happy with that. If I edit the python scripts to check that the unit
is not enabled before enabling it - the hangs stop.
For example, replacing in
/usr/lib/python2.7/site-packages/ceph_volume/systemd/systemd.py
def enable(unit):
process.run(['systemctl', 'enable', unit])
with
def enable(unit):
stdout, stderr, retcode = process.call(['systemctl', 'is-enabled',
unit], show_command=True)
if retcode != 0:
process.run(['systemctl', 'enable', unit])
fixes the issue.
Has anyone run into this, or has any ideas on how to proceed?
This looks like an oversight on our end. We don't run into this
because we haven't tried to start/stop all OSDs at once in our tests.
Can you create a ticket so that we can fix this? Your changes look
correct to me.
http://tracker.ceph.com/projects/ceph-volume/issues/new
Andras
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com