Re: Ceph OSD node trying to possibly start OSDs that were purged

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I did some digging around and yes, it is exactly as you said: systemd files remained to boot up the previous OSDs. We removed them and now it works properly. Thank you for the help.


Jean-Philippe Méthot
Openstack system administrator
Administrateur système Openstack
PlanetHoster inc.




Le 29 oct. 2019 à 15:34, Bryan Stillwell <bstillwell@xxxxxxxxxxx> a écrit :

On Oct 29, 2019, at 11:23 AM, Jean-Philippe Méthot <jp.methot@xxxxxxxxxxxxxxxxx> wrote:
A few months back, we had one of our OSD node motherboards die. At the time, we simply waited for recovery and purged the OSDs that were on the dead node. We just replaced that node and added back the drives as new OSDs. At the ceph administration level, everything looks fine, no duplicate OSDs when I execute map commands or ask Ceph to list what OSDs are on the node. However, on the OSD node, in /var/log/ceph/ceph-volume, I see that every time the server boots, ceph-volume tries to search for OSD fsids that don’t exist. Here’s the error:

[2019-10-29 13:12:02,864][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
 File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, in newfunc
   return f(*a, **kw)
 File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 148, in main
   terminal.dispatch(self.mapper, subcommand_args)
 File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch
   instance.main()
 File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line 40, in main
   terminal.dispatch(self.mapper, self.argv)
 File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch
   instance.main()
 File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in is_root
   return func(*a, **kw)
 File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/trigger.py", line 70, in main
   Activate(['--auto-detect-objectstore', osd_id, osd_uuid]).main()
 File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/activate.py", line 339, in main
   self.activate(args)
 File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in is_root
   return func(*a, **kw)
 File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/activate.py", line 249, in activate
   raise RuntimeError('could not find osd.%s with fsid %s' % (osd_id, osd_fsid))
RuntimeError: could not find osd.213 with fsid 22800a80-2445-41a3-8643-69b4b84d598a

Of course this fsid ID isn’t listed anywhere in Ceph. Where does ceph-volume get this fsid from? Even when looking at the code, it’s not particularly obvious. This is ceph mimic running on CentOS 7 and bluestore.

That's not the cluster fsid, but the osd fsid.  Try running this command on your OSD node to get more details:

ceph-volume inventory --format json-pretty

My guess is there are some systemd files laying around for the old OSDs, or you were using 'ceph-volume simple' in the past (check for /etc/ceph/osd/).

Bryan


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux