On 2017-09-08 06:06 PM, Gregory Farnum wrote: > On Fri, Sep 8, 2017 at 5:47 PM, Mclean, Patrick <Patrick.Mclean@xxxxxxxx> wrote: > >> On a related note, we are very curious why the snapshot id is >> incremented when a snapshot is deleted, this creates lots >> phantom entries in the deleted snapshots set. Interleaved >> deletions and creations will cause massive fragmentation in >> the interval set. The only reason we can come up for this >> is to track if anything changed, but I suspect a different >> value that doesn't inject entries in to the interval set might >> be better for this purpose. > Yes, it's because having a sequence number tied in with the snapshots > is convenient for doing comparisons. Those aren't leaked snapids that > will make holes; when we increment the snapid to delete something we > also stick it in the removed_snaps set. (I suppose if you alternate > deleting a snapshot with adding one that does increase the size until > you delete those snapshots; hrmmm. Another thing to avoid doing I > guess.) > Fair enough, though it seems like these limitations of the snapshot system should be documented. We most likely would have used a completely different strategy if it was documented that certain snapshot creation and removal patterns could cause the cluster to fall over over time. >>> It might really just be the osdmap update processing -- that would >>> make me happy as it's a much easier problem to resolve. But I'm also >>> surprised it's *that* expensive, even at the scales you've described. >> That would be nice, but unfortunately all the data is pointing >> to PGPool::Update(), > Yes, that's the OSDMap update processing I referred to. This is good > in terms of our ability to remove it without changing client > interfaces and things. That is good to hear, hopefully this stuff can be improved soon then. > -Greg > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com