Re: RGW RFC: Multiple-Data-Pool Support for a Bucket

Jeegn Chen <jeegnchen@xxxxxxxxx> · Wed, 3 Jan 2018 18:36:04 +0800

Hi Matt,

Inline

Thanks,
Jeegn

2018-01-02 22:05 GMT+08:00 Matt Benjamin <mbenjami@xxxxxxxxxx>:
> Hi,
>
> inline
>
> On Tue, Dec 26, 2017 at 11:44 PM, Jeegn Chen <jeegnchen@xxxxxxxxx> wrote:
>> Hi Robin
>>
>> Reply in inline.
>>
>> Thanks,
>> Jeegn
>>
>> 2017-12-27 3:00 GMT+08:00 Robin H. Johnson <robbat2@xxxxxxxxxx>:
>>> On Tue, Dec 26, 2017 at 09:48:26AM +0800, Jeegn Chen wrote:
>>>> In the daily use of Ceph RGW cluster, we find some pain points when
>>>> using current one-bucket-one-data-pool implementation.
>>>> I guess one-bucket-multiple-data-pools may help (See the appended
>>>> detailed proposal).
>>>> What do you think?
>>> Overall I like
>>>
>>> Queries/concerns:
>>> - How would this interact w/ the bucket policy lifecycle code?
>> [Jeegn]: My understanding is that current lifecycle code will list all
>> objects in a bucket and delete the out-of-date object. Only the
>> deletion logic is related, which is covered by GC-related change.
>>
>>> - How would this interact w/ existing placement policy in bucket
>>>   creation?
>> [Jeegn]: The multiple-pool-support needs data_layout_type in
>> RGWZonePlacementInfo to have value SPLITTED (new) while the default
>> value of data_layout_type is UNIFIED(old). So the existing bucket
>> placement is assumed to have UNIFIED in data_layout_type . To enable
>> this functionality, the admin need to create the new placement policy
>> with  SPLITTED data_layout_type set explicitly. Only the bucket
>> created from SPLITTED placement policy will follow the new behavior
>> pattern.
>
> SINGLE_POOL and SPLIT_POOL?
[Jeegn]: You mean the naming UNIFIED and SPLITTED looks confused and
SINGLE_POOL and SPLIT_POOL are more intuitive?

>
> As Yehuda notes, there are fields related to tail placement in
> RGWObjManifest.  I wasn't aware that they were unused, or no longer
> used.  I've had a degree of concern for a while about the mix of
> complexity of representation and some assumptions in RGWObjManifest as
> it is.  I felt a tingle of danger around the idea of adding a new
> object attribute to deal with placement as a one-off, as well.  If
> only for the benefit of clarity and cleanup, I think it would be
> beneficial to try to think a few moves ahead on where logical and
> physical placement are going, how they eventually interact with
> storage class (as Robin noted here), and maybe simplification and
> removal of bits of old design dead-ends from the code.
[Jeegn]: My understanding is that tail_placement in RGWObjManifest is
used to deal
with the object copy within the the zone, especially for the cross-bucket copy.
When the source bucket and the destination bucket have different data
pools (logic in
RGWRados::copy_obj), the tail data will be copied and new RGWObjManifest will be
created based on the destination's placement policy.
When the source bucket and the destination bucket share the same data
pool, only the head is
copied along with the RGWObjManifest in it. The destination object
needs  tail_placement in
RGWObjManifest because the information of the source bucket (such as
source bucket marker)
is available there. The source bucket marker is needed as the prefix
of the name of tail objects
in rados level, which is why the destination object need the source
object's tail_placement to find
its tail objects. But even in this case, I don't see the pool
information in tail_placement is used
and the pool information in tail_placement should be due to
backward-compatibility.

So I think tail_placement is necessary in RGWObjManifest but the pool
information there is in
fact unused at all.

I don't quite understand what 'logical and physical placement' are
referring to. Could you give
more details or some examples?

>
>>
>>> - At the rgw-admin layer, what tooling should exist to migrate objects
>>>   between pools for a given bucket?
>> [Jeegn]: I don't expect the objects to be migrated between pools.Old
>> objects uploaded before the tail_pool switch will remain in the
>> original pool until they are deleted explicitly, which is the same
>> behavior in CephFS.
>
> I think I agree with Robin. It seems like that kind of tooling support
> would increase robustness and long-term serviceability.
[Jeegn]: Yes. It should useful. One use case come to my mind is that
we may want to
evict some pools so that we can release all the related servers for other use.

>
> Matt
>
>>
>>>
>>> --
>>> Robin Hugh Johnson
>>> Gentoo Linux: Dev, Infra Lead, Foundation Asst. Treasurer
>>> E-Mail   : robbat2@xxxxxxxxxx
>>> GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
>>> GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html