Re: rbd mirroring - journal growing and snapshot high io load

Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@xxxxxxx> · Tue, 24 May 2022 09:14:37 +0200

Hi Ronny,

Not sure what could have cause your outage with journaling TBH :/. Best
of luck for the Ceph/Proxmox bug!

On 5/23/22 20:09, ronny.lippold wrote:
> hi arthur,
> 
> just for information. we had some horrible days ...
> 
> last week, we shut some virtual machines down.
> most of them did not came back. timeout qmp socket ... and no kvm 
> console.
> 
> so, we switched to our rbd-mirror cluster and ... yes, was working, puh.
> 
> some days later, we tried to install a devel proxmox package, which 
> should help.
> did not ... helpfull was, to rbd move the image and than move back (like 
> rename).
> 
> today, i found the answer.
> 
> i cleaned up the pool config and we removed the journaling feature from 
> the images.
> after that, everything was booting fine.
> 
> maybe the performance issue with snapshots came from an proxmox bug ... 
> we will see
> (https://forum.proxmox.com/threads/possible-bug-after-upgrading-to-7-2-vm-freeze-if-backing-up-large-disks.109272/)
> 
> have a great time ...
> 
> ronny
> 
> Am 2022-05-12 15:29, schrieb Arthur Outhenin-Chalandre:
>> On 5/12/22 14:31, ronny.lippold wrote:
>>> many thanks, we will check the slides ... are looking great
>>>
>>>
>>>>>
>>>>> ok, you mean, that the growing came, cause of replication is to 
>>>>> slow?
>>>>> strange ... i thought our cluster is not so big ... but ok.
>>>>> so, we cannot use journal ...
>>>>> maybe some else have same result?
>>>>
>>>> If you want a bit more details on this you can check my slides here:
>>>> https://codimd.web.cern.ch/p/-qWD2Y0S9#/.
>>>>
>>>>
>>>> Hmmm I think there are some plan to have a way to spread the 
>>>> snapshots
>>>> in the provided interval in Reef (and not take every snapshots at 
>>>> once)
>>>> but that's unfortunately not here today... The timing thing is a bit
>>>> weird but I am not an expert on RBD snapshots implication in 
>>>> general...
>>>> Maybe you can try to reproduce by taking snapshot by hand with `rbd
>>>> mirror image snapshot` on some of your images, maybe that's something
>>>> related to really big images? Or that there was a lot of write since
>>>> the
>>>> last snapshot?
>>>>
>>>
>>> yes right, i was alos thinking of this ...
>>> i would like to find something, to debug the problem.
>>> problems after 50days ... i do not understand this
>>>
>>> which way are you actually going? do you have a replication?
>>
>> We are going towards mirror snapshots, but we didn't advertise
>> internally so far and we won't enable it on every images; it would only
>> be for new volumes if people want explicitly that feature. So we are
>> probably not going to hit these performance issues that you suffer for
>> quite some time and the scope of it should be limited...

-- 
Arthur Outhenin-Chalandre
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx