Re: Very HIGH Disk I/O latency on instances

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Wed, Jun 28, 2017 at 9:17 AM Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote:
On 06/28/17 16:52, Keynes_Lee@xxxxxxxxxxx wrote:

We were using HP Helion 2.1.5 ( OpenStack + Ceph )

The OpenStack version is Kilo and Ceph version is firefly

 

The way we backup VMs is create a snapshot by Ceph commands (rbd snapshot) then download (rbd export) it.

 

We found a very high Disk Read / Write latency during creating / deleting snapshots, it will higher than 10000 ms.

 

Even not during backup jobs, we often see a more than 4000 ms latency occurred.

 

Users start to complain.

Could you please help us to how to start the troubleshooting?

 

For creating snaps and keeping them, this was marked wontfix http://tracker.ceph.com/issues/10823

For deleting, see the recent "Snapshot removed, cluster thrashed" thread for some config to try.

Given he says he's seeing 4 second IOs even without snapshot involvement, I think Keynes must be seeing something else in his cluster.
 

And I find this to be a very severe problem. And you haven't even seen the worst... also make more and it gets slower and slower to do many things (resize, clone, snap revert, etc.) (but a fully flattened image seen by a client seems as fast as normal usually).

Let's pool some money together as a reward for making snapshots work properly/modern, like on ZFS and btrfs where they don't have to copy so much....they "redirect on write" rather than literally "copy on write". (what would be a good way to pool money like that?). If others are interested, I surely am, but would have to ask the boss about money. Even if it's only for bluestore, so only for future releases, that's ok with me. And if it keeps the copy on the same osd/fs as the original, that is acceptable too.


https://storageswiss.com/2016/04/01/snapshot-101-copy-on-write-vs-redirect-on-write/
Consider a copy-on-write system, which copies any blocks before they are overwritten with new information (i.e. it copies on writes). In other words, if a block in a protected entity is to be modified, the system will copy that block to a separate snapshot area before it is overwritten with the new information. This approach requires three I/O operations for each write: one read and two writes. [...] This decision process for each block also comes with some computational overhead.

A redirect-on-write system uses pointers to represent all protected entities. If a block needs modification, the storage system merely redirects the pointer for that block to another block and writes the data there. [...] There is zero computational overhead of reading a snapshot in a redirect-on-write system.

The redirect-on-write system uses 1/3 the number of I/O operations when modifying a protected block, and it uses no extra computational overhead reading a snapshot. Copy-on-write systems can therefore have a big impact on the performance of the protected entity. The more snapshots are created and the longer they are stored, the greater the impact to performance on the protected entity.

I wouldn't consider that a very realistic depiction of the tradeoffs involved in different snapshotting strategies[1], but BlueStore uses "redirect-on-write" under the formulation presented in those quotes. RBD clones of protected images will remain copy-on-write forever, I imagine.
-Greg

[1]: There's no reason to expect a copy-on-write system will first copy the original data and then overwrite it with the new data when it can simply inject the new data along the way. *Some* systems will copy the "old" block into a new location and then overwrite in the existing location (it helps prevent fragmentation), but many don't. And a "redirect-on-write" system needs to persist all those block metadata pointers, which may be much cheaper or much, much more expensive than just duplicating the blocks.
 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux