Re: RBD snapshot atomicity guarantees?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 18.12.18 um 11:48 schrieb Hector Martin:
> On 18/12/2018 18:28, Oliver Freyermuth wrote:
>> We have yet to observe these hangs, we are running this with ~5 VMs with ~10 disks for about half a year now with daily snapshots. But all of these VMs have very "low" I/O,
>> since we put anything I/O intensive on bare metal (but with automated provisioning of course).
>>
>> So I'll chime in on your question, especially since there might be VMs on our cluster in the future where the inner OS may not be running an agent.
>> Since we did not observe this yet, I'll also add: What's your "scale", is it hundreds of VMs / disks? Hourly snapshots? I/O intensive VMs?
> 
> 5 hosts, 15 VMs, daily snapshots. I/O is variable (customer workloads); usually not that high, but it can easily peak at 100% when certain things happen. We don't have great I/O performance (RBD over 1gbps links to HDD OSDs).
> 
> I'm poring through monitoring graphs now and I think the issue this time around was just too much dirty data in the page cache of a guest. The VM that failed spent 3 minutes flushing out writes to disk before its I/O was quiesced, at around 100 IOPS throughput (the actual data throughput was low, though, so small writes). That exceeded our timeout and then things went south from there.
> 
> I wasn't sure if fsfreeze did a full sync to disk, but given the I/O behavior I'm seeing that seems to be the case. Unfortunately coming up with an upper bound for the freeze time seems tricky now. I'm increasing our timeout to 15 minutes, we'll see if the problem recurs.
> 
> Given this, it makes even more sense to just avoid the freeze if at all reasonable. There's no real way to guarantee that a fsfreeze will complete in a "reasonable" amount of time as far as I can tell.

Potentially, if granted arbitrary command execution by the guest agent, you could check (there might be a better interface than parsing meminfo...):
  cat /proc/meminfo | grep -i dirty
  Dirty:             19476 kB
You could guess from that information how long the fsfreeze may take (ideally, combining that with allowed IOPS). 
Of course, if you have control over your VMs, you may also play with the vm.dirty_ratio and vm.dirty_background_ratio. 

Interestingly, tuned on CentOS 7 configures for a "virtual-guest" profile:
vm.dirty_ratio = 30
(default is 20 %) so they optimize for performance by increasing the dirty buffers to delay writeback even more. 
They take the opposite for their "virtual-host" profile:
vm.dirty_background_ratio = 5
(default is 10 %). 
I believe these choices are good for performance, but may increase the time it takes to freeze the VMs, especially if IOPS are limited and there's a lot of dirty data. 

Since we also have 1 Gbps links and HDD OSDs, and plan to add more and more VMs and hosts, we may also observe this one day... 
So I'm curious:
How did you implement the timeout in your case? Are you using a qemu-agent-command issuing fsfreeze with --async and --timeout instead of domfsfreeze? 
We are using domfsfreeze as of now, which (probably) has an infinite timeout, or at least no timeout documented in the manpage. 

Cheers,
	Oliver

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux