OSD Down After Reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Folks,

 

I have found similar reports of this problem in the past but can’t seem to find any solution to it.

We have ceph filesystem running mimic version 13.2.5.

OSDs are running on AWS EC2 instances with centos 7. OSD disk is an AWS nvme device.

 

Problem I,  sometimes when rebooting an OSD instance, the OSD volume fails to mount and the OSD cannot start.

 

ceph-volume.log repeats the following

[2019-08-28 09:10:42,061][ceph_volume.main][INFO  ] Running command: ceph-volume  lvm trigger 0-fcaffe93-4c03-403c-9702-7f1ec694a578

[2019-08-28 09:10:42,063][ceph_volume.process][INFO  ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size

[2019-08-28 09:10:42,074][ceph_volume][ERROR ] exception caught by decorator

Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, in newfunc

   return f(*a, **kw)

  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 148, in main

    terminal.dispatch(self.mapper, subcommand_args)

  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch

    instance.main()

  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line 40, in main

    terminal.dispatch(self.mapper, self.argv)

  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch

    instance.main()

 File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in is_root

    return func(*a, **kw)

  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/trigger.py", line 70, in main

    Activate(['--auto-detect-objectstore', osd_id, osd_uuid]).main()

  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/activate.py", line 339, in main

    self.activate(args)

  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in is_root

    return func(*a, **kw)

  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/activate.py", line 249, in activate

    raise RuntimeError('could not find osd.%s with fsid %s' % (osd_id, osd_fsid))

RuntimeError: could not find osd.0 with fsid fcaffe93-4c03-403c-9702-7f1ec694a578

 

ceph-volume-systemd.log repeats

[2019-08-28 09:10:41,877][systemd][INFO  ] raw systemd input received: lvm-0-fcaffe93-4c03-403c-9702-7f1ec694a578

[2019-08-28 09:10:41,877][systemd][INFO  ] parsed sub-command: lvm, extra data: 0-fcaffe93-4c03-403c-9702-7f1ec694a578

[2019-08-28 09:10:41,926][ceph_volume.process][INFO  ] Running command: /usr/sbin/ceph-volume lvm trigger 0-fcaffe93-4c03-403c-9702-7f1ec694a578

[2019-08-28 09:10:42,077][ceph_volume.process][INFO  ] stderr -->  RuntimeError: could not find osd.0 with fsid fcaffe93-4c03-403c-9702-7f1ec694a578

[2019-08-28 09:10:42,084][systemd][WARNING] command returned non-zero exit status: 1

[2019-08-28 09:10:42,084][systemd][WARNING] failed activating OSD, retries left: 30

 

To recover I destroy the OSD, zap the disk and create it again.

# ceph osd destroy 0 --yes-i-really-mean-it

# ceph-volume lvm zap /dev/nvme1n1 –destroy

# ceph-volume lvm create --osd-id 0 --data /dev/nvme1n1

# systemctl start ceph-osd@0

 

Is there something I need to do so that the OSD can boot without these problems?

 

Thank you!

Tom

Attachment: ceph-volume.log
Description: ceph-volume.log

Attachment: ceph-volume-systemd.log
Description: ceph-volume-systemd.log

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux