Re: RBD-Mirror Mirror Snapshot stuck

Jason Dillaman <jdillama@xxxxxxxxxx> · Thu, 21 Jan 2021 09:30:58 -0500

We actually have a bunch of bug fixes for snapshot-based mirroring
pending for the next Octopus release. I think this stuck snapshot case
has been fixed, but I'll try to verify on the pacific branch to
ensure.

On Thu, Jan 21, 2021 at 9:11 AM Adam Boyhan <adamb@xxxxxxxxxx> wrote:
>
> Decided to request a resync to see the results, I have a very aggressive snapshot mirror schedule of 5 minutes, replication just keeps starting on the latest snapshot before it finishes. Pretty sure this would just loop over and over if I don't remove the schedule.
>
> root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0
> SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE
> 10082 .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.e0c63479-b09e-4c66-a65b-085b67a19907 2 TiB Thu Jan 21 07:10:09 2021 mirror (primary peer_uuids:[])
> 10621 TestSnapper1 2 TiB Thu Jan 21 08:15:22 2021 user
> 10883 .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.7242f4d1-5203-4273-8b6d-ff4e1411216d 2 TiB Thu Jan 21 08:50:08 2021 mirror (primary peer_uuids:[])
> 10923 .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.d0c3c2e7-880b-4e62-90cc-fd501e9a87c9 2 TiB Thu Jan 21 08:55:11 2021 mirror (primary peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770])
> 10963 .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.655f7c17-2f85-42e5-9ffe-777a8a48dda3 2 TiB Thu Jan 21 09:00:09 2021 mirror (primary peer_uuids:[])
> 10993 .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.268b960c-51e9-4a60-99b4-c5e7c303fdd8 2 TiB Thu Jan 21 09:05:25 2021 mirror (primary peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770])
>
> I have removed the 5 minute schedule for now, but I don't think this should be expected behavior?
>
>
> From: "adamb" <adamb@xxxxxxxxxx>
> To: "ceph-users" <ceph-users@xxxxxxx>
> Sent: Thursday, January 21, 2021 7:40:01 AM
> Subject:  RBD-Mirror Mirror Snapshot stuck
>
> I have a rbd-mirror snapshot on 1 image that failed to replicate and now its not getting cleaned up.
>
> The cause of this was my fault based on my steps. Just trying to understand how to clean up/handle the situation.
>
> Here is how I got into this situation.
>
> - Created manual rbd snapshot on the image
> - On the remote cluster I cloned the snapshot
> - While cloned on the secondary cluster I made the mistake of deleting the snapshot on the primary
> - The subsequent mirror snapshot failed
> - I then removed the clone
> - The next mirror snapshot was successful but I was left with this mirror snapshot on the primary that I can't seem to get rid of
>
> root@Ccscephtest1:/var/log/ceph# rbd snap ls --all CephTestPool1/vm-100-disk-0
> SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE
> 10082 .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.e0c63479-b09e-4c66-a65b-085b67a19907 2 TiB Thu Jan 21 07:10:09 2021 mirror (primary peer_uuids:[])
> 10243 .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.483e55aa-2f64-4bb0-ac0f-7b5aac59830e 2 TiB Thu Jan 21 07:30:08 2021 mirror (primary peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770])
>
> I have tried deleting the snap with "rbd snap rm" like normal user created snaps, but no luck. Anyway to force the deletion?
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
Jason
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx