This is one of the pitfalls of package-based installs. This dynamic with Nova and other virtualization systems has been well-known for at least a dozen years. I would not expect a Luminous client (i.e. librbd / librados) to have an issue, though — it should be able to handle pg-upmap. If you have a reference indicating the need to update to the Nautilus client, please send it along. I wonder if you have clients that are actually older than Luminous, that could cause problems. Cf https://tracker.ceph.com/issues/13301 Run `ceph features` which should give you client info. An unfortunate wrinkle is that in the case of pg-upmap, some clients may report “jewel” but their feature bitmaps actually indicate compatibility with pg-upmap. If you see clients that are pre-Luminous, focus restarts and migrations on those. OpenStack components themselves sometimes have dependencies on Ceph versions, so I would look at those and at libvirt itself as well. > On Feb 18, 2025, at 1:48 PM, Pardhiv Karri <meher4india@xxxxxxxxx> wrote: > > Hi, > > We recently upgraded our Ceph from Luminous to Nautilus and upgraded the > ceph clients on OpenStack (using rbd). All went well and after a few days, > we randomly saw instances getting stuck with libvirt_qemu_exporter, which > is getting the libvirt stuck on Openstack compute nodes. We had to kill > those instances process, and then libvirt is returning. But the issue is > happening again on the compute nodes with other instances. Upon doing some > research, I found that we need to migrate the instances to use the latest > (nautilus) ceph client, as they still use the old(luminous) client when > spun up. The only way to get them to have the Nautilus client is to live > migrate or reboot. We have thousands of instances, and doing any of those > takes a long time without impacting the customer. Is there any other fix to > solve this issue without migrating or rebooting the instances? > > Error on compute hosts: (renamed host and instance id) > > Feb 18 00:08:00 cmp03 libvirtd[5362]: 2025-02-18 00:08:00.510+0000: 5627: > warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (query, > none) for domain instance-009141b8; current job is (query, none) owned by > (5628 remoteDispatchDomainBlockStats, 0 <null>) for (322330s, 0s) > > Thanks, > Pardhiv > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx