Re: Policy based object tiering in RGW

"Varada Kari (System Engineer)" <varadaraja.kari@xxxxxxxxxxxx> · Tue, 3 Apr 2018 09:04:58 +0530



On Mon, Apr 2, 2018 at 11:28 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> On Fri, Mar 30, 2018 at 9:47 PM, Varada Kari (System Engineer)
> <varadaraja.kari@xxxxxxxxxxxx> wrote:
>> Hi,
>>
>> Last week at Cephalacon, had a brief discussion about this Josh.
>> Wanted to get your inputs/comments on the same.
>>
>> At Flipkart, we are dealing with a massive object growth per month.
>> Most of the objects are read couple of times and left in the bucket
>> for longer duration(several years) for compliance reasons.  Ceph can
>> handle all that data well, but doesn't provide interfaces to move data
>> to a cold storage like glacier. Right we are using offline tools to
>> achieve this movement of data across pools/clusters. This proposal is
>> to introduce a new idea to have multiple tiers with in the cluster and
>> manage/move data around them.
>>
>> Idea is to use placement targets[1] for buckets, Ceph provides support
>> to create custom data pools along with default .rgw.bucket pool. The
>> custom pool is used for specifying a different placement strategy for
>> user data, e.g. a custom pool could be added to use SSDs based OSDs.
>> The index pool and other ancillary pools stays same. Placement target
>> is defined by radosgw-admin under 'regions'.
>>
>> RGW accounts can use the bucket location/region to create the buckets,
>> once buckets are created all the objects are routed without sending
>> the location again.
>>
>> We can introduce 3 classes/pools in the cluster.
>>
>> One,  completely on SSDs, which is used by latency sensitive
>> customers. This pool is may not big in size but can accommodate more
>> objects of small size. Reads and writes both are served at milli
>> second latency.
>>
>> Second, Medium sized objects and more objects, this can have ssd based
>> cache tier Backed by HDD based tier. Writes are served faster and
>> reads can be latent because of tiering.
>>
>> Third, Big objects and no limits on number,  directly to HDD based
>> pool. There are not latency sensitive. This pool can be replication
>> based or EC based.
>>
>> we can assign policies based on latencies and capacity to these pools.
>> Along with these three categories, we can enable tiering among these
>> pools or we can have additional pools supporting the archival. While
>> creating users/accounts we can place the user in certain group and
>> assign corresponding pool for object placement. Additionally we can
>> enable multisite/cloud sync for the users who wants to move their data
>> to different cluster based on policies.
>>
>> Using the bucket/Object policies, we can identify the buckets/objects
>> can be vaulted and we can move them to archival tiers, based on
>> policies.  This simulates the Glacier kind of functionality in AWS,
>> but within the cluster.  As an example, User can set a policy on a
>> bucket to be vaulted after couple of months to a tier in the same
>> cluster or to a different cluster.
>>
>> We have the agents already flushing the data between cache and base
>> tier. Objecter knows which pool is tier of what, we have to extend the
>> functionality of Objecter to support multiple levels of tiering for
>> reading/writing objects. But this has to be tied up with bucket
>> policies at RGW level.Using cloud sync feature or multisite(a tweaked
>> version to support bucket policies) we can vault specific objects to a
>> different cluster.  Haven't completely thought about the design, how
>> do we want to overload objecter or we might have to design a new
>> global objecter to be aware of the multisite tier.
>
> Hmm, it sounds like you're interested in extending the RADOS
> cache-tier functionality for this. That is definitely a mistake; we
> have been backing off support for that over the past several releases.
> Sage has a plan for some "tiering v2" infrastructure (that integrates
> with SK Telecom's dedupe work) which might fit with this but I don't
> think it has any kind of timeline for completion.
>
Yes for internal data movement across pools. I am not too particular
about using the
current implemetation, if tiering V2 solves this better, will be
interested to use it.
The current problem is transferring object/bucket life cycles policies
to rados for moving the data around.
I am not sure, if this needs a different policy engine at RGW layer,
to transcode these policies into tiering ops to move the data to a
different pool.
And we have to manage/indicate this object is moved to a different
pool and we have to bring it back or do a proxy read.
I am thinking mostly from the object life cycle management from RGW.

>
> Especially since you're discussing moving data across clusters, and
> RGW is already maintaining a number of indexes and things (eg, head
> objects), I think it's probably best to have RGW maintain metadata
> about the "real" location of uploaded objects.
> -Greg
>
As one more policy on the object, we can have archiving this object to
a different cluster. Here don't want to overload rados, but use RGW
cloud sync or multisite to sync this data to a different cluster.
When we starting integrating bucket/object policies to the life cycle
management and tiering, interesting to explore on how long i want to
it in the same pool or different pool or a different cluster.
Varada
>>
>> This enables us to grow across regions and supporting temperature
>> based object tiering to support part of the Object Life cycle
>> management.
>>
>> Please let me know your thoughts on this.
>>
>> [1] http://cephnotes.ksperis.com/blog/2014/11/28/placement-pools-on-rados-gw
>>
>> Thanks,
>> Varada
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html