Re: RBD Journaling seemingly getting stuck for some VMs after upgrade to Octopus

Ilya Dryomov <idryomov@xxxxxxxxx> · Mon, 12 Aug 2024 11:09:51 +0200

On Mon, Aug 12, 2024 at 10:20 AM Oliver Freyermuth
<freyermuth@xxxxxxxxxxxxxxxxxx> wrote:
>
> Dear Cephalopodians,
>
> we've successfully operated a "good old" Mimic cluster with primary RBD images, replicated via journaling to a "backup cluster" with Octopus, for the past years (i.e. one-way replication).
> We've now finally gotten around upgrading the cluster with the primary images to Octopus (and plan to upgrade further in the near future).
>
> After the upgrade, all MON+MGR-OSD+rbd_mirror daemons are running 15.2.17.
>
> We run three rbd-mirror daemons which all share the following client with auth in the "backup" cluster, to which they write:
>
>    client.rbd_mirror_backup
>          caps: [mon] profile rbd-mirror
>          caps: [osd] profile rbd
>
> and the following shared client with auth in the "primary cluster" from which they are reading:
>
>    client.rbd_mirror
>          caps: [mon] profile rbd
>          caps: [osd] profile rbd
>
> i.e. the same auth as described in the docs[0].
>
> Checking on the primary cluster, we get:
>
> # rbd mirror pool status
>    health: UNKNOWN
>    daemon health: UNKNOWN
>    image health: OK
>    images: 288 total
>        288 replaying
>
> For some reason, some values are "unknown" here. But mirroring seems to work, as checking on the backup cluster reveals, see for example:
>
>    # rbd mirror image status zabbix-test.example.com-disk2
>      zabbix-test.example.com-disk2:
>      global_id:   1bdcb981-c1c5-4172-9583-be6a6cd996ec
>      state:       up+replaying
>      description: replaying, {"bytes_per_second":8540.27,"entries_behind_primary":0,"entries_per_second":1.8,"non_primary_position":{"entry_tid":869176,"object_number":504,"tag_tid":1},"primary_position":{"entry_tid":11143,"object_number":7,"tag_tid":1}}
>      service:     rbd_mirror_backup on rbd-mirror002.example.com
>      last_update: 2024-08-12 09:53:17
>
> However, we do in some seemingly random cases see that journals are never advanced on the primary cluster — staying with the example above, on the primary cluster I find the following:
>
>    # rbd journal status --image zabbix-test.physik.uni-bonn.de-disk2
>    minimum_set: 1
>    active_set: 126
>      registered clients:
>            [id=, commit_position=[positions=[[object_number=7, tag_tid=1, entry_tid=11143], [object_number=6, tag_tid=1, entry_tid=11142], [object_number=5, tag_tid=1, entry_tid=11141], [object_number=4, tag_tid=1, entry_tid=11140]]], state=connected]
>            [id=52b80bb0-a090-4f7d-9950-c8691ed8fee9, commit_position=[positions=[[object_number=505, tag_tid=1, entry_tid=869181], [object_number=504, tag_tid=1, entry_tid=869180], [object_number=507, tag_tid=1, entry_tid=869179], [object_number=506, tag_tid=1, entry_tid=869178]]], state=connected]
>
> As you can see, the minimum_set was not advanced. As can be seen in "mirror image status", it shows the strange output that non_primary_position seems much more advanced than primary_position. This seems to happen "at random" for only a few volumes...
> There are no other active clients apart from the actual VM (libvirt+qemu).

Hi Oliver,

Were the VM clients (i.e. librbd on the hypervisor nodes) upgraded as well?

>
> As a quick fix, to purge journals piling up over and over, we've only found the "solution" to temporarily disable and then re-enable journaling for affected VM disks, which can be identified by:
>   for A in $(rbd ls); do echo -n "$A: "; rbd --format=json journal status --image $A | jq '.active_set - .minimum_set'; done
>
>
> Any idea what is going wrong here?
> This did not happen with the primary cluster running Mimic and the backup cluster running Octopus before, and also did not happen when both were running Mimic.

You might be hitting https://tracker.ceph.com/issues/57396.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx