On Tue, Sep 5, 2017 at 1:44 PM, Florian Haas <florian@xxxxxxxxxxx> wrote: > Hi everyone, > > with the Luminous release out the door and the Labor Day weekend over, > I hope I can kick off a discussion on another issue that has irked me > a bit for quite a while. There doesn't seem to be a good documented > answer to this: what are Ceph's real limits when it comes to RBD > snapshots? > > For most people, any RBD image will have perhaps a single-digit number > of snapshots. For example, in an OpenStack environment we typically > have one snapshot per Glance image, a few snapshots per Cinder volume, > and perhaps a few snapshots per ephemeral Nova disk (unless clones are > configured to flatten immediately). Ceph generally performs well under > those circumstances. > > However, things sometimes start getting problematic when RBD snapshots > are generated frequently, and in an automated fashion. I've seen Ceph > operators configure snapshots on a daily or even hourly basis, > typically when using snapshots as a backup strategy (where they > promise to allow for very short RTO and RPO). In combination with > thousands or maybe tens of thousands of RBDs, that's a lot of > snapshots. And in such scenarios (and only in those), users have been > bitten by a few nasty bugs in the past — here's an example where the > OSD snap trim queue went berserk in the event of lots of snapshots > being deleted: > > http://tracker.ceph.com/issues/9487 > https://www.spinics.net/lists/ceph-devel/msg20470.html > > It seems to me that there still isn't a good recommendation along the > lines of "try not to have more than X snapshots per RBD image" or "try > not to have more than Y snapshots in the cluster overall". Or is the > "correct" recommendation actually "create as many snapshots as you > might possibly want, none of that is allowed to create any instability > nor performance degradation and if it does, that's a bug"? I think we're closer to "as many snapshots as you want", but there are some known shortages there. First of all, if you haven't seen my talk from the last OpenStack summit on snapshots and you want a bunch of details, go watch that. :p https://www.openstack.org/videos/boston-2017/ceph-snapshots-for-fun-and-profit-1 There are a few dimensions there can be failures with snapshots: 1) right now the way we mark snapshots as deleted is suboptimal — when deleted they go into an interval_set in the OSDMap. So if you have a bunch of holes in your deleted snapshots, it is possible to inflate the osdmap to a size which causes trouble. But I'm not sure if we've actually seen this be an issue yet — it requires both a large cluster, and a large map, and probably some other failure causing osdmaps to be generated very rapidly. 2) There may be issues with how rbd records what snapshots it is associated with? No idea about this; haven't heard of any. 3) Trimming snapshots requires IO. This is where most (all?) of the issues I've seen have come from; either in it being unscheduled IO that the rest of the system doesn't account for or throttle (as in the links you highlighted) or in admins overwhelming the IO capacity of their clusters. At this point I think we've got everything being properly scheduled so it shouldn't break your cluster, but you can build up large queues of deferred work. -Greg > > Looking forward to your thoughts. Thanks in advance! > > Cheers, > Florian > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com