On Mon, Sep 11, 2017 at 8:27 PM, Mclean, Patrick <Patrick.Mclean@xxxxxxxx> wrote: > > On 2017-09-08 06:06 PM, Gregory Farnum wrote: > > On Fri, Sep 8, 2017 at 5:47 PM, Mclean, Patrick <Patrick.Mclean@xxxxxxxx> wrote: > > > >> On a related note, we are very curious why the snapshot id is > >> incremented when a snapshot is deleted, this creates lots > >> phantom entries in the deleted snapshots set. Interleaved > >> deletions and creations will cause massive fragmentation in > >> the interval set. The only reason we can come up for this > >> is to track if anything changed, but I suspect a different > >> value that doesn't inject entries in to the interval set might > >> be better for this purpose. > > Yes, it's because having a sequence number tied in with the snapshots > > is convenient for doing comparisons. Those aren't leaked snapids that > > will make holes; when we increment the snapid to delete something we > > also stick it in the removed_snaps set. (I suppose if you alternate > > deleting a snapshot with adding one that does increase the size until > > you delete those snapshots; hrmmm. Another thing to avoid doing I > > guess.) > > > > > Fair enough, though it seems like these limitations of the > snapshot system should be documented. This is why I was so insistent on numbers, formulae or even rules-of-thumb to predict what works and what does not. Greg's "one snapshot per RBD per day is probably OK" from a few months ago seemed promising, but looking at your situation it's probably not that useful a rule. > We most likely would > have used a completely different strategy if it was documented > that certain snapshot creation and removal patterns could > cause the cluster to fall over over time. I think right now there are probably very few people, if any, who could *describe* the pattern that causes this. That complicates matters of documentation. :) > >>> It might really just be the osdmap update processing -- that would > >>> make me happy as it's a much easier problem to resolve. But I'm also > >>> surprised it's *that* expensive, even at the scales you've described. ^^ This is what I mean. It's kind of tough to document things if we're still in "surprised that this is causing harm" territory. > >> That would be nice, but unfortunately all the data is pointing > >> to PGPool::Update(), > > Yes, that's the OSDMap update processing I referred to. This is good > > in terms of our ability to remove it without changing client > > interfaces and things. > > That is good to hear, hopefully this stuff can be improved soon > then. Greg, can you comment on just how much potential improvement you see here? Is it more like "oh we know we're doing this one thing horribly inefficiently, but we never thought this would be an issue so we shied away from premature optimization, but we can easily reduce 70% CPU utilization to 1%" or rather like "we might be able to improve this by perhaps 5%, but 100,000 RBDs is too many if you want to be using snapshotting at all, for the foreseeable future"? Thanks again! Cheers, Florian _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com