Re: RGW 4 MiB objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Aleksey,

Thanks for the detailed breakdown!

We're currently using replication pools but will be testing ec pools soon enough and this is a useful set of parameters to look at. Also, I had not considered the bluestore parameters, thanks for pointing that out.

Kind regards

On Wed, Jul 31, 2019 at 2:36 PM Aleksey Gutikov <aleksey.gutikov@xxxxxxxxxx> wrote:
Hi Thomas,

We did some investigations some time before and got several rules how to
configure rgw and osd for big files stored on erasure-coded pool.
Hope it will be useful.
And if I have any mistakes, please let me know.

S3 object saving pipeline:

- S3 object is divided into multipart shards by client.
- Rgw shards each multipart shard into rados objects of size
rgw_obj_stripe_size.
- Primary osd stripes rados object into ec stripes of width ==
ec.k*profile.stripe_unit, ec code them and send units into secondary
osds and write into object store (bluestore).
- Each subobject of rados object has size == (rados object size)/k.
- Then while writing into disk bluestore can divide rados subobject into
extents of minimal size == bluestore_min_alloc_size_hdd.

Next rules can save some space and iops:

- rgw_multipart_min_part_size SHOULD be multiple of rgw_obj_stripe_size
(client can use different value greater than)
- MUST rgw_obj_stripe_size == rgw_max_chunk_size
- ec stripe == osd_pool_erasure_code_stripe_unit or profile.stripe_unit
- rgw_obj_stripe_size SHOULD be multiple of profile.stripe_unit*ec.k
- bluestore_min_alloc_size_hdd MAY be equal to bluefs_alloc_size (to
avoid fragmentation)
- rgw_obj_stripe_size/ec.k SHOULD be multiple of
bluestore_min_alloc_size_hdd
- bluestore_min_alloc_size_hdd MAY be multiple of profile.stripe_unit

For example, if ec.k=5:

- rgw_multipart_min_part_size = rgw_obj_stripe_size = rgw_max_chunk_size
= 20M
- rados object size == 20M
- profile.stripe_unit = 256k
- rados subobject size == 4M, 16 ec stripe units (20M / 5)
- bluestore_min_alloc_size_hdd = bluefs_alloc_size = 1M
- rados subobject can be written in 4 extents each containing 4 ec
stripe units



On 30.07.19 17:35, Thomas Bennett wrote:
> Hi,
>
> Does anyone out there use bigger than default values for
> rgw_max_chunk_size and rgw_obj_stripe_size?
>
> I'm planning to set rgw_max_chunk_size and rgw_obj_stripe_size  to
> 20MiB, as it suits our use case and from our testing we can't see any
> obvious reason not to.
>
> Is there some convincing experience that we should stick with 4MiBs?
>
> Regards,
> Tom
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


--

Best regards!
Aleksei Gutikov | Ceph storage engeneer
synesis.ru | Minsk. BY
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Thomas Bennett

Storage Engineer at SARAO
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux