On 09/27/2013 09:25 AM, Travis Rhoden wrote:
Hello everyone, I'm running a Cuttlefish cluster that hosts a lot of RBDs. I recently removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed that all of the clients had markedly decreased performance. Looking at iostat on the OSD nodes had most disks pegged at 100% util. I know there are thread priorities that can be set for clients vs recovery, but I'm not sure what deleting a snapshot falls under. I couldn't really find anything relevant. Is there anything I can tweak to lower the priority of such an operation? I didn't need it to complete fast, as "rbd snap rm" returns immediately and the actual deletion is done asynchronously. I'd be fine with it taking longer at a lower priority, but as it stands now it brings my cluster to a crawl and is causing issues with several VMs.
There are message priorities for client vs recovery operations, but unfortunately there's no setting for snapshot deletion yet. It is called snap trimming internally, but the thread timeout option is just for making sure the osd stops operating if the fs or disk beneath it fails by blocking for a very long time.
I see an "osd snap trim thread timeout" option in the docs -- Is the operation occuring here what you would call snap trimming? If so, any chance of adding an option for "osd snap trim priority" just like there is for osd client op and osd recovery op?
There's an open issue to fix this: http://tracker.ceph.com/issues/5844
Hope what I am saying makes sense...
Yes, thanks for the report! Josh _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com