Re: Policy based object tiering in RGW

"Varada Kari (System Engineer)" <varadaraja.kari@xxxxxxxxxxxx> · Tue, 3 Apr 2018 13:09:58 +0530

Grangularity of policy is bucket(group of objects). S3 supports user
policies and bucket policies. Ceph supports subset of bucket policies
in Luminous. If we can have additional headers per object, may be we
can handle per object policies also, but i am not sure how much work
is that.

For the migration, there can be a window like scrub, which can be user
specified/modified. Criteria for moving the data would be from
policies set on the bucket. Policies have to specify what do want to
do like moving to a different tier or a different cluster  and some
associated values. This might require to pass some additional
header(specific to this) to the policy engine and decisions are taken
based on them.

I haven't thought about integrating priorities to these tasks. That
might be converted to a different discussion on QOS and requires a
different discussion. And it can be interpreted differently as per use
cases.
>From RGW end, my thinking is to add QOS/throttles to the user.A user
can be guaranteed consume x% of bandwidth/resources per unit of time
in the cluster, it may be GET, PUT, DELETE or CREATE BUCKET or any
background ops like tiering or migrating the data. This is mostly from
the multi tenancy in the RGW and guaranteeing something for each user.
But when we bring the discussion to OSD, that might be guaranteeing
the certain number of ops(including the recovery and user io)
completing with in the given time always(in degraded state too). Again
this is my thinking. There are some implementations are already in
progress based on dmclock. But i haven't tested them yet.

Varada

On Tue, Apr 3, 2018 at 11:51 AM, nagarrajan raghunathan
<nagu.raghu99@xxxxxxxxx> wrote:
> Hi ,
>     For example if i  have cluster with video files . Say the cluster is
> having continuous reads and writes. Now we want to apply the policy , will
> this policy apply on each object or group of objects ? Also when the
> migration would happen i.e. During user defined maintenance window or at
> frequent intervals. Would it be required to associate priority with tiering
> based on their hits.
>
> Thanks
>
>
> On Tue, Apr 3, 2018 at 10:26 AM, Varada Kari (System Engineer)
> <varadaraja.kari@xxxxxxxxxxxx> wrote:
>>
>> Sure. i was thinking, if this can be simplified using the existing
>> functionality in rados. But i agree, if we can write a better policy
>> engine and use the rados constructs to achieve the tiering would be
>> ideal to do.
>>
>> Varada
>>
>> On Tue, Apr 3, 2018 at 9:38 AM, Matt Benjamin <mbenjami@xxxxxxxxxx> wrote:
>> > I find it strange to be arguing for worse is better, but
>> >
>> > On Mon, Apr 2, 2018 at 11:34 PM, Varada Kari (System Engineer)
>> > <varadaraja.kari@xxxxxxxxxxxx> wrote:
>> >> Yes for internal data movement across pools. I am not too particular
>> >> about using the
>> >> current implemetation, if tiering V2 solves this better, will be
>> >> interested to use it.
>> >> The current problem is transferring object/bucket life cycles policies
>> >> to rados for moving the data around.
>> >
>> > The problem is simplified when RGW moves the data around within as
>> > well as across clusters.  As you note below...
>> >
>> >> I am not sure, if this needs a different policy engine at RGW layer,
>> >> to transcode these policies into tiering ops to move the data to a
>> >> different pool.
>> >> And we have to manage/indicate this object is moved to a different
>> >> pool and we have to bring it back or do a proxy read.
>> >> I am thinking mostly from the object life cycle management from RGW.
>> >>
>> >
>> > You want to support this anyway.
>> >
>> >>>
>> >>> Especially since you're discussing moving data across clusters, and
>> >>> RGW is already maintaining a number of indexes and things (eg, head
>> >>> objects), I think it's probably best to have RGW maintain metadata
>> >>> about the "real" location of uploaded objects.
>> >>> -Greg
>> >>>
>> >> As one more policy on the object, we can have archiving this object to
>> >> a different cluster. Here don't want to overload rados, but use RGW
>> >> cloud sync or multisite to sync this data to a different cluster.
>> >> When we starting integrating bucket/object policies to the life cycle
>> >> management and tiering, interesting to explore on how long i want to
>> >> it in the same pool or different pool or a different cluster.
>> >> Varada
>> >>>>
>> >
>> >
>> >
>> > --
>> >
>> > Matt Benjamin
>> > Red Hat, Inc.
>> > 315 West Huron Street, Suite 140A
>> > Ann Arbor, Michigan 48103
>> >
>> > http://www.redhat.com/en/technologies/storage
>> >
>> > tel.  734-821-5101
>> > fax.  734-769-8938
>> > cel.  734-216-5309
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
>
> --
> Regards,
> Nagarrajan Raghunathan
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html