Hi Ronny, Not sure what could have cause your outage with journaling TBH :/. Best of luck for the Ceph/Proxmox bug! On 5/23/22 20:09, ronny.lippold wrote: > hi arthur, > > just for information. we had some horrible days ... > > last week, we shut some virtual machines down. > most of them did not came back. timeout qmp socket ... and no kvm > console. > > so, we switched to our rbd-mirror cluster and ... yes, was working, puh. > > some days later, we tried to install a devel proxmox package, which > should help. > did not ... helpfull was, to rbd move the image and than move back (like > rename). > > today, i found the answer. > > i cleaned up the pool config and we removed the journaling feature from > the images. > after that, everything was booting fine. > > maybe the performance issue with snapshots came from an proxmox bug ... > we will see > (https://forum.proxmox.com/threads/possible-bug-after-upgrading-to-7-2-vm-freeze-if-backing-up-large-disks.109272/) > > have a great time ... > > ronny > > Am 2022-05-12 15:29, schrieb Arthur Outhenin-Chalandre: >> On 5/12/22 14:31, ronny.lippold wrote: >>> many thanks, we will check the slides ... are looking great >>> >>> >>>>> >>>>> ok, you mean, that the growing came, cause of replication is to >>>>> slow? >>>>> strange ... i thought our cluster is not so big ... but ok. >>>>> so, we cannot use journal ... >>>>> maybe some else have same result? >>>> >>>> If you want a bit more details on this you can check my slides here: >>>> https://codimd.web.cern.ch/p/-qWD2Y0S9#/. >>>> >>>> >>>> Hmmm I think there are some plan to have a way to spread the >>>> snapshots >>>> in the provided interval in Reef (and not take every snapshots at >>>> once) >>>> but that's unfortunately not here today... The timing thing is a bit >>>> weird but I am not an expert on RBD snapshots implication in >>>> general... >>>> Maybe you can try to reproduce by taking snapshot by hand with `rbd >>>> mirror image snapshot` on some of your images, maybe that's something >>>> related to really big images? Or that there was a lot of write since >>>> the >>>> last snapshot? >>>> >>> >>> yes right, i was alos thinking of this ... >>> i would like to find something, to debug the problem. >>> problems after 50days ... i do not understand this >>> >>> which way are you actually going? do you have a replication? >> >> We are going towards mirror snapshots, but we didn't advertise >> internally so far and we won't enable it on every images; it would only >> be for new volumes if people want explicitly that feature. So we are >> probably not going to hit these performance issues that you suffer for >> quite some time and the scope of it should be limited... -- Arthur Outhenin-Chalandre _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx