Hi Anthony, Thank you for the reply. Here is the output from the monitor node. The monitor (includes manager) and OSD nodes have been rebooted sequentially after the upgrade to Nautilus, so I wonder why they are still showing luminous now. Anyway I can fix? or1sz2 [root@mon1 ~]# ceph features { "mon": [ { "features": "0x3ffddff8ffecffff", "release": "luminous", "num": 3 } ], "osd": [ { "features": "0x3ffddff8ffecffff", "release": "luminous", "num": 111 } ], "client": [ { "features": "0x3ffddff8ffecffff", "release": "luminous", "num": 322 } ], "mgr": [ { "features": "0x3ffddff8ffecffff", "release": "luminous", "num": 3 } ] } or1sz2 [root@mon1 ~]# dpkg -l | grep -i ceph ii ceph 14.2.22-1xenial amd64 distributed storage and file system ii ceph-base 14.2.22-1xenial amd64 common ceph daemon libraries and management tools ii ceph-common 14.2.22-1xenial amd64 common utilities to mount and interact with a ceph storage cluster ii ceph-deploy 2.0.1 all Ceph-deploy is an easy to use configuration tool ii ceph-mgr 14.2.22-1xenial amd64 manager for the ceph distributed storage system ii ceph-mon 14.2.22-1xenial amd64 monitor server for the ceph storage system ii ceph-osd 14.2.22-1xenial amd64 OSD server for the ceph storage system rc libcephfs1 10.2.11-1trusty amd64 Ceph distributed file system client library ii libcephfs2 14.2.22-1xenial amd64 Ceph distributed file system client library ii python-ceph-argparse 14.2.22-1xenial all Python 2 utility libraries for Ceph CLI ii python-cephfs 14.2.22-1xenial amd64 Python 2 libraries for the Ceph libcephfs library ii python-rados 14.2.22-1xenial amd64 Python 2 libraries for the Ceph librados library ii python-rbd 14.2.22-1xenial amd64 Python 2 libraries for the Ceph librbd library ii python-rgw 14.2.22-1xenial amd64 Python 2 libraries for the Ceph librgw library or1sz2 [root@or1dra1300 ~]# Thanks, Pardhiv On Tue, Feb 18, 2025 at 10:55 AM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote: > This is one of the pitfalls of package-based installs. This dynamic with > Nova and other virtualization systems has been well-known for at least a > dozen years. > > I would not expect a Luminous client (i.e. librbd / librados) to have an > issue, though — it should be able to handle pg-upmap. If you have a > reference indicating the need to update to the Nautilus client, please send > it along. > > I wonder if you have clients that are actually older than Luminous, that > could cause problems. > > Cf https://tracker.ceph.com/issues/13301 > > Run `ceph features` which should give you client info. An unfortunate > wrinkle is that in the case of pg-upmap, some clients may report “jewel” > but their feature bitmaps actually indicate compatibility with pg-upmap. > If you see clients that are pre-Luminous, focus restarts and migrations on > those. > > OpenStack components themselves sometimes have dependencies on Ceph > versions, so I would look at those and at libvirt itself as well. > > On Feb 18, 2025, at 1:48 PM, Pardhiv Karri <meher4india@xxxxxxxxx> wrote: > > Hi, > > We recently upgraded our Ceph from Luminous to Nautilus and upgraded the > ceph clients on OpenStack (using rbd). All went well and after a few days, > we randomly saw instances getting stuck with libvirt_qemu_exporter, which > is getting the libvirt stuck on Openstack compute nodes. We had to kill > those instances process, and then libvirt is returning. But the issue is > happening again on the compute nodes with other instances. Upon doing some > research, I found that we need to migrate the instances to use the latest > (nautilus) ceph client, as they still use the old(luminous) client when > spun up. The only way to get them to have the Nautilus client is to live > migrate or reboot. We have thousands of instances, and doing any of those > takes a long time without impacting the customer. Is there any other fix to > solve this issue without migrating or rebooting the instances? > > Error on compute hosts: (renamed host and instance id) > > Feb 18 00:08:00 cmp03 libvirtd[5362]: 2025-02-18 00:08:00.510+0000: 5627: > warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (query, > none) for domain instance-009141b8; current job is (query, none) owned by > (5628 remoteDispatchDomainBlockStats, 0 <null>) for (322330s, 0s) > > Thanks, > Pardhiv > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > -- *Pardhiv Karri* "Rise and Rise again until LAMBS become LIONS" _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx