How to stop a rbd migration and recover

Gilles Mocellin <gilles.mocellin@xxxxxxxxxxxxxx> · Wed, 23 Jun 2021 21:55:18 +0200

Hello,

As a follow-up of the thread "RBD migration between 2 EC pools : very slow".

I'm running Octopus 15.2.13.

RBD migration seems really fragile.

I started a migration to change the data pool (from an EC 3+2 to an EC 8+2) :
- rbd migration prepare
- rbd migration execute
=> 4% after 6h, and no progress 12h later
- rbd migration abort
=> does not return

After that, the state of the migration on destination image is "unknown'.
rbd info on source (in trash) and destination image shows migrating in the 
features.

Debbugging another abord command shows that it tries to take/put a lock, and 
there's one already, due to either the cacncelled execute or the subsequent 
abort.

rbd lock delete does not work, I have a strange message about a read only 
filesystem...

So, I'm stuck.
My production on the source image is stopped so far, because I use krbd and I 
must terminate the migration (either commit or abort) before being able to map 
and mount it.

What I think I'm just understanding, is that abort is meant to be used only 
when the migration execution is finished, like commit ? Am I wrong ?

If so, How can we stop an ongoinfg migration, and also, How do we recover the 
source image ?

Do I need to :
- delete the destination image
- restore the source image from the trash

What about the migrating feature that forbid krbd to map the image ?

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx