Re: Snapshot cleanup performance impact on client I/O?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 30, 2017 at 1:24 PM, David Turner <drakonstein@xxxxxxxxx> wrote:
> When you delete a snapshot, Ceph places the removed snapshot into a list in
> the OSD map and places the objects in the snapshot into a snap_trim_q.  Once
> those 2 things are done, the RBD command returns and you are moving onto the
> next snapshot.  The snap_trim_q is an n^2 operation (like all deletes in
> Ceph), which means that if the queue has 100 objects on it and takes 5
> minutes to complete, then having 200 objects in the queue will take 25
> minutes.

You keep saying deletes are an n-squared operation but I don't really
have any idea where that's coming from. Could you please elaborate? :)

> (exaggerated time frames to show math)  This same behavior can be
> seen when deleting an RBD that has 100,000 objects vs 200,000 objects, it
> takes twice as long (note that object map mitigates this greatly by ignoring
> any object that hasn't been created, so the previous test would be easiest
> to duplicate by disabling the object map on the test RBDs).
>
> So paying attention to snapshot sizes as you clean them up is more important
> than how many snapshots you clean up.  Being on Jewel, you don't really want
> to use osd_snap_trim_sleep as it literally puts a sleep onto the main op
> threads for the OSD.  In Hammer this setting was much more useful (if not
> super hacky) and in Luminous the entire process was revamped and (hopefully)
> fixed.  Jewel is pretty much not viable for large quantities of snapshots,
> but there are ways to get through them.
>
> The following thread on the ML is one of the most informative on this
> problem in Jewel.  The second link is the resuming of the thread months
> later after the fix was scheduled for backporting into 10.2.8.
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015675.html
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-April/017697.html
>
> On Fri, Jun 30, 2017 at 4:02 PM Kenneth Van Alstyne
> <kvanalstyne@xxxxxxxxxxxxxxx> wrote:
>>
>> Hey folks:
>>         I was wondering if the community can provide any advice — over
>> time and due to some external issues, we have managed to accumulate
>> thousands of snapshots of RBD images, which are now in need of cleaning up.
>> I have recently attempted to roll through a “for" loop to perform a “rbd
>> snap rm” on each snapshot, sequentially, waiting until the rbd command
>> finishes before moving onto the next one, of course.  I noticed that shortly
>> after starting this, I started seeing thousands of slow ops and a few of our
>> guest VMs became unresponsive, naturally.

In addition to the thread David linked to, I gave a talk about
snapshot trimming and capacity planning which may be helpful:
https://www.youtube.com/watch?v=rY0OWtllkn8
If you read the whole thread I'm not sure there's any new data in that
talk, but it is hopefully a little more organized/understandable. :)
-Greg

>>
>> My questions are:
>>         - Is this expected behavior?
>>         - Is the background cleanup asynchronous from the “rbd snap rm”
>> command?
>>                 - If so, are there any OSD parameters I can set to reduce
>> the impact on production?
>>         - Would “rbd snap purge” be any different?  I expect not, since
>> fundamentally, rbd is performing the same action that I do via the loop.
>>
>> Relevant details are as follows, though I’m not sure cluster size *really*
>> has any effect here:
>>         - Ceph: version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>>         - 5 storage nodes, each with:
>>                 - 10x 2TB 7200 RPM SATA Spindles (for a total of 50 OSDs)
>>                 - 2x Samsung MZ7LM240 SSDs (used as journal for the OSDs)
>>                 - 64GB RAM
>>                 - 2x Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz
>>                 - 20GBit LACP Port Channel via Intel X520 Dual Port 10GbE
>> NIC
>>
>> Let me know if I’ve missed something fundamental.
>>
>> Thanks,
>>
>> --
>> Kenneth Van Alstyne
>> Systems Architect
>> Knight Point Systems, LLC
>> Service-Disabled Veteran-Owned Business
>> 1775 Wiehle Avenue Suite 101 | Reston, VA 20190
>> c: 228-547-8045 f: 571-266-3106
>> www.knightpoint.com
>> DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
>> GSA Schedule 70 SDVOSB: GS-35F-0646S
>> GSA MOBIS Schedule: GS-10F-0404Y
>> ISO 20000 / ISO 27001 / CMMI Level 3
>>
>> Notice: This e-mail message, including any attachments, is for the sole
>> use of the intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, copy, use, disclosure, or distribution
>> is STRICTLY prohibited. If you are not the intended recipient, please
>> contact the sender by reply e-mail and destroy all copies of the original
>> message.
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux