On 2014-12-01 10:03:35 +0000, Dan Van Der Ster said:
Which version of Ceph are you using? This could be related: http://tracker.ceph.com/issues/9487
Firefly. I had seen this ticket earlier (when deleting a whole pool) and hoped the backport of the fix would be available some time soon. I must admin, I did not look this up before posting, because I had forgotten about it.
See "ReplicatedPG: don't move on to the next snap immediately"; basically, the OSD is getting into a tight loop "trimming" the snapshot objects. The fix above breaks out of that loop more frequently, and then you can use the osd snap trim sleep option to throttle it further. I’m not sure if the fix above will be sufficient if you have many objects to remove per snapshot.
Just so I get this right: With the fix alone you are not sure it would be "nice"
enough, so adjusting the snap trim sleep option in addition might be needed? I assume the loop that will be broken up with 9487 does not take the sleep time into account?
That commit is only in giant at the moment. The backport to dumpling is in the dumpling branch but not yet in a release, and firefly is still pending.
Holding my breath :) Any thoughts on the other items I had in the original post?
2) Is there a way to get a decent approximation of how much work deleting a specific snapshot will entail (in terms of objects, time, whatever)? 3) Would SSD journals help here? Or any other hardware configuration change for that matter?
Thanks! Daniel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com