Re: Snapshot removed, cluster thrashed...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Snapshots are not a free action.  To create them it's near enough free, but deleting objects in Ceph is an n^2 operation.  Being on Hammer you do not have access to the object map feature on RBDs which drastically reduces the n^2 problem by keeping track of which objects it actually needs to delete.  For your week old snapshot the cluster is needing to throw every object for the snapshot (whether it exists or not) into the snap_trim_q to be deleted.  So what n^2 means, if you aren't familiar, is that if a 1GB volume/snapshot takes 4 minutes to delete, then a 2GB volume takes 16 minutes.

Peter mentioned the setting that was implemented in Hammer and is the ONLY setting in Hammer that can help with snapshot deletions that thrash your cluster.  You NEED to use osd_snap_trim_sleep.  Jewel broke that without properly implementing adequate work-arounds for the setting, but Jewel is back on track now.  I would recommend an osd_snap_trim_sleep of about .05 to start with to see if that alleviates your pressure.  It was a bad solution to fix a problem quickly that they've finally revisited to address it properly.  What it does is every time it deletes snap shot objects, it sleeps for .05 seconds and then does the next one.  In Jewel that was broken because they moved snap shot deletions into the main op thread and the snap trim sleep just put a sleep onto the main op thread telling the osd thread to do nothing after deleting a snap trim object.

Upgrading to Jewel and enabling object_map on all of your rbds would help this problem as well as researching the new options in Jewel to fine-tune snap trim settings for your environment and hardware.  I personally still just use a small osd_snap_trim_sleep on my 3 node proxmox cluster and it works fine.  I don't get slow requests when I delete snapshots.  I used to before putting in a little snap trim sleep.  I only create snapshots about once/mo and cycle out the old ones, but it works well for me.

On Mon, Jun 26, 2017 at 8:07 AM Lindsay Mathieson <lindsay.mathieson@xxxxxxxxx> wrote:
On 26/06/2017 7:36 PM, Marco Gaiarin wrote:
> Last week i've used by the first time the snapshot feature. I've done
> some test, before, on some ''spare'' VM doing snapshot on a powered off
> VM (as expected, was merely istantaneus) and on a powered on one
> (clearly, snapshotting the RAM pose some stress on that VM, but not so
> much on the overral system, as expected).
> I've also do some test of deleting the snapshot created, but some
> minute after i've done that snapshot, and nothing relevant happens.



Have you tried restoring a snapshot? I found it unusablly slow - as in hours

--
Lindsay Mathieson

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux