Re: Broke libvirt on compute node due to Ceph Luminous to Nautilus Upgrade

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Tue, 18 Feb 2025 15:33:30 -0500

The “release” shown here isn’t what one might quite reasonably think it is.  In this context think of it as a “minimum feature set”.

> On Feb 18, 2025, at 2:01 PM, Pardhiv Karri <meher4india@xxxxxxxxx> wrote:
> 
> Hi Anthony,
> 
> Thank you for the reply. Here is the output from the monitor node. The monitor (includes manager) and OSD nodes have been rebooted sequentially after the upgrade to Nautilus, so I wonder why they are still showing luminous now. Anyway I can fix?
> 
> or1sz2 [root@mon1 ~]# ceph features
> {
>     "mon": [
>         {
>             "features": "0x3ffddff8ffecffff",
>             "release": "luminous",
>             "num": 3
>         }
>     ],
>     "osd": [
>         {
>             "features": "0x3ffddff8ffecffff",
>             "release": "luminous",
>             "num": 111
>         }
>     ],
>     "client": [
>         {
>             "features": "0x3ffddff8ffecffff",
>             "release": "luminous",
>             "num": 322
>         }
>     ],
>     "mgr": [
>         {
>             "features": "0x3ffddff8ffecffff",
>             "release": "luminous",
>             "num": 3
>         }
>     ]
> }
> or1sz2 [root@mon1 ~]# dpkg -l | grep -i ceph
> ii  ceph                                  14.2.22-1xenial                            amd64        distributed storage and file system
> ii  ceph-base                             14.2.22-1xenial                            amd64        common ceph daemon libraries and management tools
> ii  ceph-common                           14.2.22-1xenial                            amd64        common utilities to mount and interact with a ceph storage cluster
> ii  ceph-deploy                           2.0.1                                      all          Ceph-deploy is an easy to use configuration tool
> ii  ceph-mgr                              14.2.22-1xenial                            amd64        manager for the ceph distributed storage system
> ii  ceph-mon                              14.2.22-1xenial                            amd64        monitor server for the ceph storage system
> ii  ceph-osd                              14.2.22-1xenial                            amd64        OSD server for the ceph storage system
> rc  libcephfs1                            10.2.11-1trusty                            amd64        Ceph distributed file system client library
> ii  libcephfs2                            14.2.22-1xenial                            amd64        Ceph distributed file system client library
> ii  python-ceph-argparse                  14.2.22-1xenial                            all          Python 2 utility libraries for Ceph CLI
> ii  python-cephfs                         14.2.22-1xenial                            amd64        Python 2 libraries for the Ceph libcephfs library
> ii  python-rados                          14.2.22-1xenial                            amd64        Python 2 libraries for the Ceph librados library
> ii  python-rbd                            14.2.22-1xenial                            amd64        Python 2 libraries for the Ceph librbd library
> ii  python-rgw                            14.2.22-1xenial                            amd64        Python 2 libraries for the Ceph librgw library
> or1sz2 [root@or1dra1300 ~]#
> 
> Thanks,
> Pardhiv
> 
> 
> 
> 
> On Tue, Feb 18, 2025 at 10:55 AM Anthony D'Atri <anthony.datri@xxxxxxxxx <mailto:anthony.datri@xxxxxxxxx>> wrote:
>> This is one of the pitfalls of package-based installs.  This dynamic with Nova and other virtualization systems has been well-known for at least a dozen years.
>> 
>> I would not expect a Luminous client (i.e. librbd / librados) to have an issue, though — it should be able to handle pg-upmap.  If you have a reference indicating the need to update to the Nautilus client, please send it along.
>> 
>> I wonder if you have clients that are actually older than Luminous, that could cause problems.
>> 
>> Cf https://tracker.ceph.com/issues/13301
>> 
>> Run `ceph features` which should give you client info.  An unfortunate wrinkle is that in the case of pg-upmap, some clients may report “jewel” but their feature bitmaps actually indicate compatibility with pg-upmap.  If you see clients that are pre-Luminous, focus restarts and migrations on those.
>> 
>> OpenStack components themselves sometimes have dependencies on Ceph versions, so I would look at those and at libvirt itself as well.
>> 
>>> On Feb 18, 2025, at 1:48 PM, Pardhiv Karri <meher4india@xxxxxxxxx <mailto:meher4india@xxxxxxxxx>> wrote:
>>> 
>>> Hi,
>>> 
>>> We recently upgraded our Ceph from Luminous to Nautilus and upgraded the
>>> ceph clients on OpenStack (using rbd). All went well and after a few days,
>>> we randomly saw instances getting stuck with libvirt_qemu_exporter, which
>>> is getting the libvirt stuck on Openstack compute nodes. We had to kill
>>> those instances process, and then libvirt is returning. But the issue is
>>> happening again on the compute nodes with other instances. Upon doing some
>>> research, I found that we need to migrate the instances to use the latest
>>> (nautilus) ceph client, as they still use the old(luminous) client when
>>> spun up. The only way to get them to have the Nautilus client is to live
>>> migrate or reboot. We have thousands of instances, and doing any of those
>>> takes a long time without impacting the customer. Is there any other fix to
>>> solve this issue without migrating or rebooting the instances?
>>> 
>>> Error on compute hosts: (renamed host and instance id)
>>> 
>>> Feb 18 00:08:00 cmp03 libvirtd[5362]: 2025-02-18 00:08:00.510+0000: 5627:
>>> warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (query,
>>> none) for domain instance-009141b8; current job is (query, none) owned by
>>> (5628 remoteDispatchDomainBlockStats, 0 <null>) for (322330s, 0s)
>>> 
>>> Thanks,
>>> Pardhiv
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
>> 
> 
> 
> 
> --
> Pardhiv Karri
> "Rise and Rise again until LAMBS become LIONS" 
> 
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx