Cloud tiering thoughts

Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> · Tue, 16 Oct 2018 18:08:10 -0700

Here are my current thoughts about tiering, and also specifically
about cloud tiering.

1. Storage-classes

Previously a placement target would be mapped into a set of rados
pools (index, data, extra), whereas now placement targets will add
storage classes (S3 uses these). Object placement will be defined by
the placement target, and the storage class.

 - If storage class is not specified, the standard class is being
used, or the default class for the bucket.

 - The set of supported storage-classes will need to be defined as
part of the zonegroup configuration

 - Each zone will have a mapping between the existing storage classes
and how the set of rados pools

A bucket has a default placement target. Objects heads are always
written to the default placement target, even if object is being put
on a different placement target. The object's manifest has a tail

 - A bucket will have a default storage-class. The X-Amz-Storage-Class
header can be set when creating the bucket, and this will set the
default placement target for the bucket. Note that this cannot be
changed (as this is where objects' heads reside).

We should probably make it so that when head and tail are being placed
on different placement targets, the head will not contain any data,
other than the object’s metadata.

The code that implements the above can be found here:
https://github.com/yehudasa/ceph/tree/wip-rgw-tiering-3

(multipart upload stuff is incomplete, but is addressed now)

2. Cloud targets

There are many options. We’re not going to implement everything. Here
are a few points to consider:

 - How is data written to the backend cloud?

The question here is whether the generated objects can be read
directly by client application, or are we going to mangle the data in
some way. For example stripe data, encrypt, etc.

 - Indexed by us?

The important question here is actually: do we keep a head object for
each object that is created on the remote tier?

Do we keep a bucket index, or do we rely on the backing cloud for this
info? If we index it, how do we make sure we keep synchronized? Do we
need to?

 - Proxied?

When reading (and possibly writing) data, are we going to serve as a
proxy, or do we just send redirects?
Redirects might be the easiest way to implement tiering, however, it
cripples access control. As we don't have complete control over the
remote cloud (probably have credentials that will represent a single
user).

 - Bucket/object name mappings

When dealing with cloud services where we don’t have complete control
over, we’d need to map bucket and object names to ones that will be
used on the cloud service. This means that multiple rgw buckets could
be written to the same destination bucket. The cloud sync code does
the same thing.

 - ACL mappings

Object ACLs need to be converted to ACLs on the remote system. The
cloud sync code does the same thing.

3. Cloud tier implementation

A lot of it depends on what we decide to do in (2). I think that as a
start we can focus on the following:

 * objects on cloud tier should be readable externally

This entails a few things. It means that objects aren't striped or
encoded in some way, but are kept as whole objects on the backend.

 * indexed by rgw, proxied writes, some reads (user's object data
reads) can be redirected, should be able to read remote objects
internally

The reasoning behind this is that it keeps the current rgw behavior of
having head object that keeps object's metadata. Without it most of
the rgw object functionality will not work, and I think that as a
first step we want to keep the functional behavior closer.

This entails that we also index the objects, although bucket listing
can be redirected probably. User object reads don't need to be
proxied, as long as presigned redirects can work.

Implementing this will require:

 - creating new type of objects put processor that will be able to
store the data remotely. The head object should still be stored on the
bucket's default tier. Note that for this to work we will need to make
sure that even if bucket's default tier is a cloud tier, we will still
treat it as local tier for storing the objects' heads.

 - Object read iteration should be able to read remote object.
 - Object copy could trigger remote copy (if source and dest at the
same remote tier)
 - In general, object copies from and to remote should be done via a
background worker, and might take too long.

 - The manifest should also reflect the required info. In any case, it
no longer stores any info that is rados specific, so it might not
require much (or even any) changes.

 - We should refactor the whole data object access api, so that things
are done cleanly.

 - Stuff like multipart objects will also need to be addressed. Part
creation will need to be proxied, and the complete will create the
needed local head.

 - Remote cloud objects could be versioned, in which case we could
have a more reliable head to tail mapping.

Thoughts?

Yehuda