Re: [ceph-users] Adventures with large RGW buckets

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 1 Aug 2019 13:47:48 -0700

On Thu, Aug 1, 2019 at 12:06 PM Eric Ivancich <ivancich@xxxxxxxxxx> wrote:
>
> Hi Paul,
>
> I’ll interleave responses below.
>
> On Jul 31, 2019, at 2:02 PM, Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
>
> How could the bucket deletion of the future look like? Would it be possible
> to put all objects in buckets into RADOS namespaces and implement some kind
> of efficient namespace deletion on the OSD level similar to how pool deletions
> are handled at a lower level?
>
> I’ll raise that with other RGW developers. I’m unfamiliar with how RADOS namespaces are handled.

I expect RGW could do this, but unfortunately deleting namespaces at
the RADOS level is not practical. People keep asking and maybe in some
future world it will be cheaper, but a namespace is effectively just
part of the object name (and I don't think it's even the first thing
they sort by for the key entries in metadata tracking!), so deleting a
namespace would be equivalent to deleting a snapshot[1] but with the
extra cost that namespaces can be created arbitrarily on every write
operation (so our solutions for handling snapshots without it being
ludicrously expensive wouldn't apply). Deleting a namespace from the
OSD-side using map updates would require the OSD to iterate through
just about all the objects they have and examine them for deletion.

Is it cheaper than doing over the network? Sure. Is it cheap enough
we're willing to let a single user request generate that kind of
cluster IO on an unconstrained interface? Absolutely not.
-Greg
[1]: Deleting snapshots is only feasible because every OSD maintains a
sorted secondary index from snapid->set<objects>. This is only
possible because snapids are issued by the monitors and clients
cooperate in making sure they can't get reused after being deleted.
Namespaces are generated by clients and there are no constraints on
their use, reuse, or relationship to each other. We could maybe work
around these problems, but it'd be building a fundamentally different
interface than what namespaces currently are.
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx