On Thu, Sep 21, 2017 at 9:53 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >> > The other reason we maintain the full set of deleted snaps is to prevent >> > client operations from re-creating deleted snapshots — we filter all >> > client IO which includes snaps against the deleted_snaps set in the PG. >> > Apparently this is also big enough in RAM to be a real (but much >> > smaller) problem. >> > >> > Unfortunately eliminating that is a lot harder >> >> Just checking here, for clarification: what is "that" here? Are you >> saying that eliminating the full set of deleted snaps is harder than >> introducing a deleting_snaps member, or that both are harder than >> potential mitigation strategies that were previously discussed in this >> thread? > > > Eliminating the full set we store on the OSD node is much harder than > converting the OSDMap to specify deleting_ rather than deleted_snaps — the > former at minimum requires changes to the client protocol and we’re not > actually sure how to do it; the latter can be done internally to the cluster > and has a well-understood algorithm to implement. Got it. Thanks for the clarification. >> > This is why I was so insistent on numbers, formulae or even >> > rules-of-thumb to predict what works and what does not. Greg's "one >> > snapshot per RBD per day is probably OK" from a few months ago seemed >> > promising, but looking at your situation it's probably not that useful >> > a rule. >> >> Is there something that you can suggest here, perhaps taking into >> account the discussion you had with Patrick last week? > > > I think I’ve already shared everything I have on this. Try to treat > sequential snaps the same way and don’t create a bunch of holes in the > interval set. Right. But that's not something the regular Ceph cluster operator has much influence over. Cheers, Florian _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com