Re: slow cluster perfomance during snapshot restore

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 29 Jun 2017 17:33:26 +0000

On Thu, Jun 29, 2017 at 7:44 AM Stanislav Kopp <staskopp@xxxxxxxxx> wrote:
Hi,

we're testing ceph cluster as storage backend for our virtualization

(proxmox), we're using RBD for raw VM images. If I'm trying to restore

some snapshot with  "rbd snap rollback", the whole cluster becomes

really slow, the "apply_latency" goes to 4000-6000ms from normally

0-10ms, I see load on OSDs with many read/writes and many blocked

processes in "vmstat", after restore is  finished everything is fine

again.

My question is, is it possibly to set some "priority" for snapshot

restore, like "nice"  so it doesn't stresses OSDs so much?

RBD snap rollback is a very expensive operation; it involves going to all the objects in the RBD volume and doing a full copy from the snapshot into the active "head" location.

I'm not sure if there are built-in tunable commands available (check the manpages? Or Jason, do you know?), but if not you can use any generic tooling which limits how much network traffic the RBD command can run.
-Greg

BTW, I'm using ceph 11.2 on Ubuntu 16.04, 4 nodes, with 16 OSDs (8TB

each) + Intel 3710 SSD per 4 OSDs for journals.

Best,

Stan

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com