Re: Broke libvirt on compute node due to Ceph Luminous to Nautilus Upgrade

Pardhiv Karri <meher4india@xxxxxxxxx> · Tue, 18 Feb 2025 11:07:46 -0800

Hi Anthony,

Regarding the need to upgrade Ceph, we are upgrading our current OpenStack
from Queens (yeah, very old) to Antelope and the openstack vendor required
us to upgrade Ceph from Luminous to Nautilus for their migration code to
work as the framework they are using to migrate/upgrade only works with
Nautilus and above.

--Pardhiv

On Tue, Feb 18, 2025 at 11:01 AM Pardhiv Karri <meher4india@xxxxxxxxx>
wrote:

> Hi Anthony,
>
> Thank you for the reply. Here is the output from the monitor node. The
> monitor (includes manager) and OSD nodes have been rebooted sequentially
> after the upgrade to Nautilus, so I wonder why they are still
> showing luminous now. Anyway I can fix?
>
> or1sz2 [root@mon1 ~]# ceph features
> {
>     "mon": [
>         {
>             "features": "0x3ffddff8ffecffff",
>             "release": "luminous",
>             "num": 3
>         }
>     ],
>     "osd": [
>         {
>             "features": "0x3ffddff8ffecffff",
>             "release": "luminous",
>             "num": 111
>         }
>     ],
>     "client": [
>         {
>             "features": "0x3ffddff8ffecffff",
>             "release": "luminous",
>             "num": 322
>         }
>     ],
>     "mgr": [
>         {
>             "features": "0x3ffddff8ffecffff",
>             "release": "luminous",
>             "num": 3
>         }
>     ]
> }
> or1sz2 [root@mon1 ~]# dpkg -l | grep -i ceph
> ii  ceph                                  14.2.22-1xenial
>            amd64        distributed storage and file system
> ii  ceph-base                             14.2.22-1xenial
>            amd64        common ceph daemon libraries and management tools
> ii  ceph-common                           14.2.22-1xenial
>            amd64        common utilities to mount and interact with a ceph
> storage cluster
> ii  ceph-deploy                           2.0.1
>            all          Ceph-deploy is an easy to use configuration tool
> ii  ceph-mgr                              14.2.22-1xenial
>            amd64        manager for the ceph distributed storage system
> ii  ceph-mon                              14.2.22-1xenial
>            amd64        monitor server for the ceph storage system
> ii  ceph-osd                              14.2.22-1xenial
>            amd64        OSD server for the ceph storage system
> rc  libcephfs1                            10.2.11-1trusty
>            amd64        Ceph distributed file system client library
> ii  libcephfs2                            14.2.22-1xenial
>            amd64        Ceph distributed file system client library
> ii  python-ceph-argparse                  14.2.22-1xenial
>            all          Python 2 utility libraries for Ceph CLI
> ii  python-cephfs                         14.2.22-1xenial
>            amd64        Python 2 libraries for the Ceph libcephfs library
> ii  python-rados                          14.2.22-1xenial
>            amd64        Python 2 libraries for the Ceph librados library
> ii  python-rbd                            14.2.22-1xenial
>            amd64        Python 2 libraries for the Ceph librbd library
> ii  python-rgw                            14.2.22-1xenial
>            amd64        Python 2 libraries for the Ceph librgw library
> or1sz2 [root@or1dra1300 ~]#
>
> Thanks,
> Pardhiv
>
>
>
>
> On Tue, Feb 18, 2025 at 10:55 AM Anthony D'Atri <anthony.datri@xxxxxxxxx>
> wrote:
>
>> This is one of the pitfalls of package-based installs.  This dynamic with
>> Nova and other virtualization systems has been well-known for at least a
>> dozen years.
>>
>> I would not expect a Luminous client (i.e. librbd / librados) to have an
>> issue, though — it should be able to handle pg-upmap.  If you have a
>> reference indicating the need to update to the Nautilus client, please send
>> it along.
>>
>> I wonder if you have clients that are actually older than Luminous, that
>> could cause problems.
>>
>> Cf https://tracker.ceph.com/issues/13301
>>
>> Run `ceph features` which should give you client info.  An unfortunate
>> wrinkle is that in the case of pg-upmap, some clients may report “jewel”
>> but their feature bitmaps actually indicate compatibility with pg-upmap.
>> If you see clients that are pre-Luminous, focus restarts and migrations on
>> those.
>>
>> OpenStack components themselves sometimes have dependencies on Ceph
>> versions, so I would look at those and at libvirt itself as well.
>>
>> On Feb 18, 2025, at 1:48 PM, Pardhiv Karri <meher4india@xxxxxxxxx> wrote:
>>
>> Hi,
>>
>> We recently upgraded our Ceph from Luminous to Nautilus and upgraded the
>> ceph clients on OpenStack (using rbd). All went well and after a few days,
>> we randomly saw instances getting stuck with libvirt_qemu_exporter, which
>> is getting the libvirt stuck on Openstack compute nodes. We had to kill
>> those instances process, and then libvirt is returning. But the issue is
>> happening again on the compute nodes with other instances. Upon doing some
>> research, I found that we need to migrate the instances to use the latest
>> (nautilus) ceph client, as they still use the old(luminous) client when
>> spun up. The only way to get them to have the Nautilus client is to live
>> migrate or reboot. We have thousands of instances, and doing any of those
>> takes a long time without impacting the customer. Is there any other fix
>> to
>> solve this issue without migrating or rebooting the instances?
>>
>> Error on compute hosts: (renamed host and instance id)
>>
>> Feb 18 00:08:00 cmp03 libvirtd[5362]: 2025-02-18 00:08:00.510+0000: 5627:
>> warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (query,
>> none) for domain instance-009141b8; current job is (query, none) owned by
>> (5628 remoteDispatchDomainBlockStats, 0 <null>) for (322330s, 0s)
>>
>> Thanks,
>> Pardhiv
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>>
>>
>
> --
> *Pardhiv Karri*
> "Rise and Rise again until LAMBS become LIONS"
>
>
>

-- 
*Pardhiv Karri*
"Rise and Rise again until LAMBS become LIONS"
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx