Hi Folks, I have found similar reports of this problem in the past but can’t seem to find any solution to it.
We have ceph filesystem running mimic version 13.2.5. OSDs are running on AWS EC2 instances with centos 7. OSD disk is an AWS nvme device.
Problem I, sometimes when rebooting an OSD instance, the OSD volume fails to mount and the OSD cannot start.
ceph-volume.log repeats the following [2019-08-28 09:10:42,061][ceph_volume.main][INFO ] Running command: ceph-volume lvm trigger 0-fcaffe93-4c03-403c-9702-7f1ec694a578 [2019-08-28 09:10:42,063][ceph_volume.process][INFO ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size [2019-08-28 09:10:42,074][ceph_volume][ERROR ] exception caught by decorator Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, in newfunc return f(*a, **kw) File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 148, in main terminal.dispatch(self.mapper, subcommand_args) File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch instance.main() File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line 40, in main terminal.dispatch(self.mapper, self.argv) File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch instance.main() File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/trigger.py", line 70, in main Activate(['--auto-detect-objectstore', osd_id, osd_uuid]).main() File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/activate.py", line 339, in main self.activate(args) File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/activate.py", line 249, in activate raise RuntimeError('could not find osd.%s with fsid %s' % (osd_id, osd_fsid)) RuntimeError: could not find osd.0 with fsid fcaffe93-4c03-403c-9702-7f1ec694a578 ceph-volume-systemd.log repeats [2019-08-28 09:10:41,877][systemd][INFO ] raw systemd input received: lvm-0-fcaffe93-4c03-403c-9702-7f1ec694a578 [2019-08-28 09:10:41,877][systemd][INFO ] parsed sub-command: lvm, extra data: 0-fcaffe93-4c03-403c-9702-7f1ec694a578 [2019-08-28 09:10:41,926][ceph_volume.process][INFO ] Running command: /usr/sbin/ceph-volume lvm trigger 0-fcaffe93-4c03-403c-9702-7f1ec694a578 [2019-08-28 09:10:42,077][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.0 with fsid fcaffe93-4c03-403c-9702-7f1ec694a578 [2019-08-28 09:10:42,084][systemd][WARNING] command returned non-zero exit status: 1 [2019-08-28 09:10:42,084][systemd][WARNING] failed activating OSD, retries left: 30 To recover I destroy the OSD, zap the disk and create it again.
# ceph osd destroy 0 --yes-i-really-mean-it # ceph-volume lvm zap /dev/nvme1n1 –destroy # ceph-volume lvm create --osd-id 0 --data /dev/nvme1n1 # systemctl start ceph-osd@0 Is there something I need to do so that the OSD can boot without these problems?
Thank you! Tom |
Attachment:
ceph-volume.log
Description: ceph-volume.log
Attachment:
ceph-volume-systemd.log
Description: ceph-volume-systemd.log
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com