On Fri, Jun 30, 2017 at 1:24 PM, David Turner <drakonstein@xxxxxxxxx> wrote: > When you delete a snapshot, Ceph places the removed snapshot into a list in > the OSD map and places the objects in the snapshot into a snap_trim_q. Once > those 2 things are done, the RBD command returns and you are moving onto the > next snapshot. The snap_trim_q is an n^2 operation (like all deletes in > Ceph), which means that if the queue has 100 objects on it and takes 5 > minutes to complete, then having 200 objects in the queue will take 25 > minutes. You keep saying deletes are an n-squared operation but I don't really have any idea where that's coming from. Could you please elaborate? :) > (exaggerated time frames to show math) This same behavior can be > seen when deleting an RBD that has 100,000 objects vs 200,000 objects, it > takes twice as long (note that object map mitigates this greatly by ignoring > any object that hasn't been created, so the previous test would be easiest > to duplicate by disabling the object map on the test RBDs). > > So paying attention to snapshot sizes as you clean them up is more important > than how many snapshots you clean up. Being on Jewel, you don't really want > to use osd_snap_trim_sleep as it literally puts a sleep onto the main op > threads for the OSD. In Hammer this setting was much more useful (if not > super hacky) and in Luminous the entire process was revamped and (hopefully) > fixed. Jewel is pretty much not viable for large quantities of snapshots, > but there are ways to get through them. > > The following thread on the ML is one of the most informative on this > problem in Jewel. The second link is the resuming of the thread months > later after the fix was scheduled for backporting into 10.2.8. > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015675.html > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-April/017697.html > > On Fri, Jun 30, 2017 at 4:02 PM Kenneth Van Alstyne > <kvanalstyne@xxxxxxxxxxxxxxx> wrote: >> >> Hey folks: >> I was wondering if the community can provide any advice — over >> time and due to some external issues, we have managed to accumulate >> thousands of snapshots of RBD images, which are now in need of cleaning up. >> I have recently attempted to roll through a “for" loop to perform a “rbd >> snap rm” on each snapshot, sequentially, waiting until the rbd command >> finishes before moving onto the next one, of course. I noticed that shortly >> after starting this, I started seeing thousands of slow ops and a few of our >> guest VMs became unresponsive, naturally. In addition to the thread David linked to, I gave a talk about snapshot trimming and capacity planning which may be helpful: https://www.youtube.com/watch?v=rY0OWtllkn8 If you read the whole thread I'm not sure there's any new data in that talk, but it is hopefully a little more organized/understandable. :) -Greg >> >> My questions are: >> - Is this expected behavior? >> - Is the background cleanup asynchronous from the “rbd snap rm” >> command? >> - If so, are there any OSD parameters I can set to reduce >> the impact on production? >> - Would “rbd snap purge” be any different? I expect not, since >> fundamentally, rbd is performing the same action that I do via the loop. >> >> Relevant details are as follows, though I’m not sure cluster size *really* >> has any effect here: >> - Ceph: version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) >> - 5 storage nodes, each with: >> - 10x 2TB 7200 RPM SATA Spindles (for a total of 50 OSDs) >> - 2x Samsung MZ7LM240 SSDs (used as journal for the OSDs) >> - 64GB RAM >> - 2x Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz >> - 20GBit LACP Port Channel via Intel X520 Dual Port 10GbE >> NIC >> >> Let me know if I’ve missed something fundamental. >> >> Thanks, >> >> -- >> Kenneth Van Alstyne >> Systems Architect >> Knight Point Systems, LLC >> Service-Disabled Veteran-Owned Business >> 1775 Wiehle Avenue Suite 101 | Reston, VA 20190 >> c: 228-547-8045 f: 571-266-3106 >> www.knightpoint.com >> DHS EAGLE II Prime Contractor: FC1 SDVOSB Track >> GSA Schedule 70 SDVOSB: GS-35F-0646S >> GSA MOBIS Schedule: GS-10F-0404Y >> ISO 20000 / ISO 27001 / CMMI Level 3 >> >> Notice: This e-mail message, including any attachments, is for the sole >> use of the intended recipient(s) and may contain confidential and privileged >> information. Any unauthorized review, copy, use, disclosure, or distribution >> is STRICTLY prohibited. If you are not the intended recipient, please >> contact the sender by reply e-mail and destroy all copies of the original >> message. >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com