Re: Cloud tiering thoughts

Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> · Thu, 18 Oct 2018 14:30:47 -0700

On Thu, Oct 18, 2018 at 12:14 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
>
>
> On 10/16/18 9:08 PM, Yehuda Sadeh-Weinraub wrote:
> > Here are my current thoughts about tiering, and also specifically
> > about cloud tiering.
> >
> > 1. Storage-classes
> >
> > Previously a placement target would be mapped into a set of rados
> > pools (index, data, extra), whereas now placement targets will add
> > storage classes (S3 uses these). Object placement will be defined by
> > the placement target, and the storage class.
> >
> >   - If storage class is not specified, the standard class is being
> > used, or the default class for the bucket.
> >
> >   - The set of supported storage-classes will need to be defined as
> > part of the zonegroup configuration
> >
> >   - Each zone will have a mapping between the existing storage classes
> > and how the set of rados pools
> >
> > A bucket has a default placement target. Objects heads are always
> > written to the default placement target, even if object is being put
> > on a different placement target. The object's manifest has a tail
> >
> >   - A bucket will have a default storage-class. The X-Amz-Storage-Class
> > header can be set when creating the bucket, and this will set the
> > default placement target for the bucket. Note that this cannot be
> > changed (as this is where objects' heads reside).
> >
> > We should probably make it so that when head and tail are being placed
> > on different placement targets, the head will not contain any data,
> > other than the object’s metadata.
> >
> > The code that implements the above can be found here:
> > https://github.com/yehudasa/ceph/tree/wip-rgw-tiering-3
> >
> > (multipart upload stuff is incomplete, but is addressed now)
> >
> > 2. Cloud targets
> >
> > There are many options. We’re not going to implement everything. Here
> > are a few points to consider:
> >
> >   - How is data written to the backend cloud?
> >
> > The question here is whether the generated objects can be read
> > directly by client application, or are we going to mangle the data in
> > some way. For example stripe data, encrypt, etc.
> >
> >   - Indexed by us?
> >
> > The important question here is actually: do we keep a head object for
> > each object that is created on the remote tier?
> >
> > Do we keep a bucket index, or do we rely on the backing cloud for this
> > info? If we index it, how do we make sure we keep synchronized? Do we
> > need to?
> >
> >   - Proxied?
> >
> > When reading (and possibly writing) data, are we going to serve as a
> > proxy, or do we just send redirects?
> > Redirects might be the easiest way to implement tiering, however, it
> > cripples access control. As we don't have complete control over the
> > remote cloud (probably have credentials that will represent a single
> > user).
> >
> >
> >   - Bucket/object name mappings
> >
> > When dealing with cloud services where we don’t have complete control
> > over, we’d need to map bucket and object names to ones that will be
> > used on the cloud service. This means that multiple rgw buckets could
> > be written to the same destination bucket. The cloud sync code does
> > the same thing.
> >
> >   - ACL mappings
> >
> > Object ACLs need to be converted to ACLs on the remote system. The
> > cloud sync code does the same thing.
> >
> > 3. Cloud tier implementation
> >
> > A lot of it depends on what we decide to do in (2). I think that as a
> > start we can focus on the following:
> >
> >   * objects on cloud tier should be readable externally
> >
> > This entails a few things. It means that objects aren't striped or
> > encoded in some way, but are kept as whole objects on the backend.
> >
> >   * indexed by rgw, proxied writes, some reads (user's object data
> > reads) can be redirected, should be able to read remote objects
> > internally
> >
> > The reasoning behind this is that it keeps the current rgw behavior of
> > having head object that keeps object's metadata. Without it most of
> > the rgw object functionality will not work, and I think that as a
> > first step we want to keep the functional behavior closer.
> >
> > This entails that we also index the objects, although bucket listing
> > can be redirected probably. User object reads don't need to be
> > proxied, as long as presigned redirects can work.
> >
> > Implementing this will require:
> >
> >   - creating new type of objects put processor that will be able to
> > store the data remotely. The head object should still be stored on the
> > bucket's default tier. Note that for this to work we will need to make
> > sure that even if bucket's default tier is a cloud tier, we will still
> > treat it as local tier for storing the objects' heads.
> >
> >   - Object read iteration should be able to read remote object.
> >   - Object copy could trigger remote copy (if source and dest at the
> > same remote tier)
> >   - In general, object copies from and to remote should be done via a
> > background worker, and might take too long.
> >
> >   - The manifest should also reflect the required info. In any case, it
> > no longer stores any info that is rados specific, so it might not
> > require much (or even any) changes.
> >
> >   - We should refactor the whole data object access api, so that things
> > are done cleanly.
> >
> >   - Stuff like multipart objects will also need to be addressed. Part
> > creation will need to be proxied, and the complete will create the
> > needed local head.
> >
> >   - Remote cloud objects could be versioned, in which case we could
> > have a more reliable head to tail mapping.
> >
> > Thoughts?
> >
> > Yehuda
>
>
> The design for storage classes sounds wonderful. Have you put much
> thought into the lifecycle transitions between them? I wonder how much
> of the lifecycle expiration stuff we can reuse here for scheduling,
> retries, etc.

I think the lifecycle expiration stuff is pretty close to what we
need. We will need to have a separate worker that would be responsible
for the scheduled transitions.

>
> Re: cloud targets, do you see that as a new kind of storage class? The
> main use case I've associated with this is like glacier, when you just
> want to move cold objects out of your cluster. A cloud storage class and
> lifecycle transitions should be enough to support this.
>

Glacier deals with storage that is not readily available and as such
it's somewhat different than cloud provider. I do think that we need
to provide proxying capability. We'll need to be able to read and
write from the remote ends anyway, so the delta is not that great at
that point.

> I'm less familiar with use cases that would involve proxying and
> indexing. There also seems to be some overlap with the cloud sync module
> - do you see this work replacing that module, or are they fundamentally
> different?
>
Cloud sync mirrors existing data. Cloud tier is the data, might be the
only copy of the data. In essence the features are distinct, but I
think that as we move towards hybrid cloud solutions, and as we add
more capabilities (for example bi-directional sync), we might find
that there is some overlap or that we could combine the features
somehow. I don't think we're there yet.

Yehuda