Thanks all for your insight! After more investigation,I'd like to share some output, your comments are appreciated as always. ;-) * proposal ** introduce tail_data_pool Each storage class is presented as individual placement rule. Each placement rule has serveral pools: + index_pool(for bucket index) + data_pool(for head) + tail_data_pool(for tail) Finally,different storage classes use the same index_pool and data_pool, but different tail_data_pool. Using different storage classes means using different tail_data_pools. Here's a placement rule/storage class config sample output: #+BEGIN_EXAMPLE { "key": "STANDARD", "val": { "index_pool": "us-east-1.rgw.buckets.index", "data_pool": "us-east-1.rgw.buckets.data", "tail_data_pool": "us-east-1.rgw.buckets.3replica", <- introduced for rgw_obj raw data "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "index_type": 0, "compression": "", "inline_head": 1 } }, #+END_EXAMPLE Multipart rgw_obj will be stored at tail_data_pool. Further more,for those rgw_obj only has head,not tail, we can refactor Manifest to support disable inline first chunk data of rgw_obj into the head, which can finally match the semantic of AWS S3 sotrage class: #+BEGIN_EXAMPLE { "key": "STANDARD", "val": { "index_pool": "us-east-1.rgw.buckets.index", "data_pool": "us-east-1.rgw.buckets.data", "tail_data_pool": "us-east-1.rgw.buckets.3replica", "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "index_type": 0, "compression": "", "inline_head": 1 <- introduced for inline first data chunk of rgw_obj into head } }, #+END_EXAMPLE ** expose different storage class as individual placement rule As draft ,placment list will list all storage class: #+BEGIN_EXAMPLE ./bin/radosgw-admin -c ceph.conf zone placement list [ { "key": "STANDARD", "val": { "index_pool": "us-east-1.rgw.buckets.index", "data_pool": "us-east-1.rgw.buckets.data", "tail_data_pool": "us-east-1.rgw.buckets.3replica", "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "index_type": 0, "compression": "", "inline_head": 1 } }, { "key": "RRS", "val": { "index_pool": "us-east-1.rgw.buckets.index", "data_pool": "us-east-1.rgw.buckets.data", "tail_data_pool": "us-east-1.rgw.buckets.2replica", "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "index_type": 0, "compression": "" "inline_head": 1 } } ] #+END_EXAMPLE Another option would be expose serveral storage classes in the same placement rule: #+BEGIN_EXAMPLE ./bin/radosgw-admin -c ceph.conf zone placement list [ { "key": "default-placement", "val": { "index_pool": "us-east-1.rgw.buckets.index", "storage_class" { "STANDARD" : { "data_pool": "us-east-1.rgw.3replica", "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "inline_head": 1 }, "RRS" : { "data_pool": "us-east-1.rgw.2replica", "data_extra_pool": "us-east-1.rgw.buckets.non-ec", "inline_head": 1 }, } "index_type": 0, "compression": "" } } ] #+END_EXAMPLE This approach strict the meaning of storage class as different data pool. But we may support things like Multi-Regional Storage ( https://cloud.google.com/storage/docs/storage-classes#multi-regional ) in the future. So I'd prefer expost storage class at placement rule level. * issues If we introduced the tail_data_pool,we need corresponding modification. I'm not sure about this, feedback are appreciated. ** use rgw_pool instead of placment rule in the RGWManifest In the RGWObjManifest, we've defined two placement rules: + head_placement_rule (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L406) + tail_placement.placement_rule (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L119) then we use placment rule to find the data_pool of the placement rule.If we introduced the tail_data_pool,there's no need to keep tail_placement.placement_rule(although it is the same as head_placement_rule) In the RGWObjManifest internal, `class rgw_obj_select`also defined a `placement_rule` (https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.h#L127), which finally used placement rule to find the data_pool of that placement rule. So I suppose to instead of using placement rule in the RGWManifest, replaced with rgw_pool.so that we've the chance to use tail_data_pool and data_pool in the same placement rule. On 23 June 2017 at 13:43, 方钰翔 <abcdeffyx@xxxxxxxxx> wrote: > I think storing the head object and tail objects in different pools is also > necessary. > > If we introduce a tail_data_pool in placement rule to store tail objects. we > can create replicated pool for data_pool and ec for tail_data_pool to > leverage the performance and capacity. > > 2017-06-22 17:44 GMT+08:00 Jiaying Ren <mikulely@xxxxxxxxx>: >> >> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@xxxxxxxxxx> wrote: >> >>> >> >> My original thinking was that when we reassign an object to a new >> >> placement, we only touch its tail which is incompatible with that. >> >> However, thinking about it some more I don't see why we need to have >> >> this limitation, so it's probably possible to keep the data in the >> >> head in one case, and modify the object and have the data in the tail >> >> (object's head will need to be rewritten anyway because we modify the >> >> manifest). >> >> I think that the decision whether we keep data in the head could be a >> >> property of the zone. >> >> Yes, I guess we also need to check the zone placement rule config when >> pull the realm in the multisite env, to make sure the sync peer has >> the same storage class support, multisite sync should also respect >> object storage class. >> >> >> In any case, once an object is created changing >> >> this property will only affect newly created objects, and old objects >> >> could still be read correctly. Having data in the head is an >> >> optimization that supposedly reduces small objects latency, and I >> >> still think it's useful in a mixed pools situation. The thought is >> >> that the bulk of the data will be at the tail anyway. However, we >> >> recently changed the default head size from 512k to 4M, so this might >> >> not be true any more. Anyhow, I favour having this as a configurable >> >> (which should be simple to add). >> >> >> >> Yehuda >> >> >> > >> > >> > I would be strongly against keeping data in the head when the head is in >> > a >> > lower-level storage class. That means that the entire object is >> > violating >> > the constraints of the storage class. >> >> Agreed. The default behavior of storage class require us to keep the >> data in the head as the same pool as the tail. Even if we made this as >> a configureable option, we should disable this kind of inline by >> default to match the default behavior of storage class. >> >> > >> > Of course, having the head in a lower storage class (data or not) is >> > probably a violation. Maybe we'd have to require that all heads go in >> > the >> > highest storage class. >> > >> > Daniel >> >> On 21 June 2017 at 23:50, Daniel Gryniewicz <dang@xxxxxxxxxx> wrote: >> > On 06/21/2017 11:14 AM, Yehuda Sadeh-Weinraub wrote: >> >> >> >> On Wed, Jun 21, 2017 at 7:46 AM, Daniel Gryniewicz <dang@xxxxxxxxxx> >> >> wrote: >> >>> >> >>> On 06/21/2017 10:04 AM, Matt Benjamin wrote: >> >>>> >> >>>> >> >>>> Hi, >> >>>> >> >>>> Looks very coherent. >> >>>> >> >>>> My main question is about... >> >>>> >> >>>> ----- Original Message ----- >> >>>>> >> >>>>> >> >>>>> From: "Jiaying Ren" <mikulely@xxxxxxxxx> >> >>>>> To: "Yehuda Sadeh-Weinraub" <ysadehwe@xxxxxxxxxx> >> >>>>> Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> >> >>>>> Sent: Wednesday, June 21, 2017 7:39:24 AM >> >>>>> Subject: RGW: Implement S3 storage class feature >> >>>>> >> >>>> >> >>>>> >> >>>>> * Todo List >> >>>>> >> >>>>> + the head of rgw-object should only contains the metadata of >> >>>>> rgw-object,the first chunk of rgw-object data should be stored in >> >>>>> the same pool as the tail of rgw-object >> >>>> >> >>>> >> >>>> >> >>>> Is this always desirable? >> >>>> >> >>> >> >>> Well, unless the head pool happens to have the correct storage class, >> >>> it's >> >>> necessary. And I'd guess that verification of this is complicated, >> >>> although >> >>> maybe not. >> >>> >> >>> Maybe we can use the head pool if it has >= the correct storage class? >> >>> >> >> My original thinking was that when we reassign an object to a new >> >> placement, we only touch its tail which is incompatible with that. >> >> However, thinking about it some more I don't see why we need to have >> >> this limitation, so it's probably possible to keep the data in the >> >> head in one case, and modify the object and have the data in the tail >> >> (object's head will need to be rewritten anyway because we modify the >> >> manifest). >> >> I think that the decision whether we keep data in the head could be a >> >> property of the zone. In any case, once an object is created changing >> >> this property will only affect newly created objects, and old objects >> >> could still be read correctly. Having data in the head is an >> >> optimization that supposedly reduces small objects latency, and I >> >> still think it's useful in a mixed pools situation. The thought is >> >> that the bulk of the data will be at the tail anyway. However, we >> >> recently changed the default head size from 512k to 4M, so this might >> >> not be true any more. Anyhow, I favour having this as a configurable >> >> (which should be simple to add). >> >> >> >> Yehuda >> >> >> > >> > >> > I would be strongly against keeping data in the head when the head is in >> > a >> > lower-level storage class. That means that the entire object is >> > violating >> > the constraints of the storage class. >> > >> > Of course, having the head in a lower storage class (data or not) is >> > probably a violation. Maybe we'd have to require that all heads go in >> > the >> > highest storage class. >> > >> > Daniel >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html