Re: rbd mirroring - journal growing and snapshot high io load

"ronny.lippold" <ceph@xxxxxxxxx> · Thu, 15 Sep 2022 14:32:28 +0200

hi arthur, some time went ...

i would like to know, if there are some news of your setup.
do you have replication active running?

we are using actually snapshot based and had last time a move of both 
clusters.
after that, we had some damaged filesystems ind the kvm vms.
did you ever had such a problems in your tests.

i think, there are not so many people, how are using ceph replication.
for me its hard to find the right way.
can a snapshot based ceph replication be crash consisten? i think no.

thanks für any help there ...

ronny

Am 2022-05-23 20:09, schrieb ronny.lippold:
hi arthur,

just for information. we had some horrible days ...

last week, we shut some virtual machines down.
most of them did not came back. timeout qmp socket ... and no kvm 
console.

so, we switched to our rbd-mirror cluster and ... yes, was working, 
puh.

some days later, we tried to install a devel proxmox package, which 
should help.
did not ... helpfull was, to rbd move the image and than move back
(like rename).

today, i found the answer.

i cleaned up the pool config and we removed the journaling feature
from the images.
after that, everything was booting fine.

maybe the performance issue with snapshots came from an proxmox bug
... we will see
(https://forum.proxmox.com/threads/possible-bug-after-upgrading-to-7-2-vm-freeze-if-backing-up-large-disks.109272/)

have a great time ...

ronny

Am 2022-05-12 15:29, schrieb Arthur Outhenin-Chalandre:
On 5/12/22 14:31, ronny.lippold wrote:
many thanks, we will check the slides ... are looking great

ok, you mean, that the growing came, cause of replication is to 
slow?
strange ... i thought our cluster is not so big ... but ok.
so, we cannot use journal ...
maybe some else have same result?

If you want a bit more details on this you can check my slides here:
https://codimd.web.cern.ch/p/-qWD2Y0S9#/.

Hmmm I think there are some plan to have a way to spread the 
snapshots
in the provided interval in Reef (and not take every snapshots at 
once)
but that's unfortunately not here today... The timing thing is a bit
weird but I am not an expert on RBD snapshots implication in 
general...
Maybe you can try to reproduce by taking snapshot by hand with `rbd
mirror image snapshot` on some of your images, maybe that's 
something
related to really big images? Or that there was a lot of write since
the
last snapshot?

yes right, i was alos thinking of this ...
i would like to find something, to debug the problem.
problems after 50days ... i do not understand this

which way are you actually going? do you have a replication?

We are going towards mirror snapshots, but we didn't advertise
internally so far and we won't enable it on every images; it would 
only
be for new volumes if people want explicitly that feature. So we are
probably not going to hit these performance issues that you suffer for
quite some time and the scope of it should be limited...
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx