Broke libvirt on compute node due to Ceph Luminous to Nautilus Upgrade

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We recently upgraded our Ceph from Luminous to Nautilus and upgraded the
ceph clients on OpenStack (using rbd). All went well and after a few days,
we randomly saw instances getting stuck with libvirt_qemu_exporter, which
is getting the libvirt stuck on Openstack compute nodes. We had to kill
those instances process, and then libvirt is returning. But the issue is
happening again on the compute nodes with other instances. Upon doing some
research, I found that we need to migrate the instances to use the latest
(nautilus) ceph client, as they still use the old(luminous) client when
spun up. The only way to get them to have the Nautilus client is to live
migrate or reboot. We have thousands of instances, and doing any of those
takes a long time without impacting the customer. Is there any other fix to
solve this issue without migrating or rebooting the instances?

Error on compute hosts: (renamed host and instance id)

Feb 18 00:08:00 cmp03 libvirtd[5362]: 2025-02-18 00:08:00.510+0000: 5627:
warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (query,
none) for domain instance-009141b8; current job is (query, none) owned by
(5628 remoteDispatchDomainBlockStats, 0 <null>) for (322330s, 0s)

Thanks,
Pardhiv
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux