Re: RBD: How many snapshots is too many?

Florian Haas <florian@xxxxxxxxxxx> · Fri, 8 Sep 2017 10:45:12 +0200

> In our use case, we are severly hampered by the size of removed_snaps
> (50k+) in the OSDMap to the point were ~80% of ALL cpu time is spent in
> PGPool::update and its interval calculation code. We have a cluster of
> around 100k RBDs with each RBD having upto 25 snapshots and only a small
> portion of our RBDs mapped at a time (~500-1000). For size / performance
> reasons we try to keep the number of snapshots low (<25) and need to
> prune snapshots. Since in our use case RBDs 'age' at different rates,
> snapshot pruning creates holes to the point where we the size of the
> removed_snaps interval set in the osdmap is 50k-100k in many of our Ceph
> clusters. I think in general around 2 snapshot removal operations
> currently happen a minute just because of the volume of snapshots and
> users we have.

Right. Greg, this is what I was getting at: 25 snapshots per RBD is
firmly in "one snapshot per day per RBD" territory — this is something
that a cloud operator might do, for example, offering daily snapshots
going back one month. But it still wrecks the cluster simply by having
lots of images (even though only a fraction of them, less than 1%, are
ever in use). That's rather counter-intuitive, it doesn't hit you
until you have lots of images, and once you're affected by it there's
no practical way out — where "out" is defined as "restoring overall
cluster performance to something acceptable".

> We found the PGPool::update and the interval calculation code code to be
> quite inefficient. Some small changes made it a lot faster giving more
> breathing room, we shared and these and most already got applied:
> https://github.com/ceph/ceph/pull/17088
> https://github.com/ceph/ceph/pull/17121
> https://github.com/ceph/ceph/pull/17239
> https://github.com/ceph/ceph/pull/17265
> https://github.com/ceph/ceph/pull/17410 (not yet merged, needs more fixes)
>
> However for our use case these patches helped, but overall CPU usage in
> this area is still high (>70% or so), making the Ceph cluster slow
> causing blocked requests and many operations (e.g. rbd map) to take a
> long time.

I think this makes this very much a practical issue, not a
hypothetical/theoretical one.

> We are trying to work around these issues by trying to change our
> snapshot strategy. In the short-term we are manually defragmenting the
> interval set by scanning for holes and trying to delete snapids in
> between holes to coalesce more holes. This is not so nice to do. In some
> cases we employ strategies to 'recreate' old snapshots (as we need to
> keep them) at higher snapids. For our use case a 'snapid rename' feature
> would have been quite helpful.
>
> I hope this shines some light on practical Ceph clusters in which
> performance is bottlenecked not by I/O but by snapshot removal.

For others following this thread or retrieving it from the list
archive some time down the road, I'd rephrase that as "bottlenecked
not by I/O but by CPU utilization associated with snapshot removal".
Is that fair to say, Patrick? Please correct me if I'm
misrepresenting.

Greg (or Josh/Jason/Sage/anyone really :) ), can you provide
additional insight as to how these issues can be worked around or
mitigated, besides the PRs that Patrick and his colleagues have
already sent?

Cheers,
Florian
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com