Re: Ceph Erasure Coding - Stored vs used

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Simon and Janne,

Thanks for the reply.
It seems indeed related to the bluestore_min_alloc_size.

In an old thread I've also found the following:

*S3 object saving pipeline:*

*- S3 object is divided into multipart shards by client.*

*- Rgw shards each multipart shard into rados objects of size*

*rgw_obj_stripe_size.*

*- Primary osd stripes rados object into ec stripes of width ==*

*ec.k*profile.stripe_unit, ec code them and send units into secondary*

*osds and write into object store (bluestore).*

*- Each subobject of rados object has size == (rados object size)/k.*

*- Then while writing into disk bluestore can divide rados subobject into*

*extents of minimal size == bluestore_min_alloc_size_hdd.*



*Next rules can save some space and iops:*

*- rgw_multipart_min_part_size SHOULD be multiple of rgw_obj_stripe_size*

*(client can use different value greater than)*

*- MUST rgw_obj_stripe_size == rgw_max_chunk_size*

*- ec stripe == osd_pool_erasure_code_stripe_unit or profile.stripe_unit*

*- rgw_obj_stripe_size SHOULD be multiple of profile.stripe_unit*ec.k*

*- bluestore_min_alloc_size_hdd MAY be equal to bluefs_alloc_size (to*

*avoid fragmentation)*

*- rgw_obj_stripe_size/ec.k SHOULD be multiple of*

*bluestore_min_alloc_size_hdd*

*- bluestore_min_alloc_size_hdd MAY be multiple of profile.stripe_unit*


Doing this calculation would result in the fact that smaller files of 135KB
end up in chunks of +/- 22KB. Writing this in 64KB gives me quite some
wasted space.

As far as I found, the allocation setting is kept on OSD level and set
during creation. Adapting the setting requires each OSD to be recreated. As
we have around 150 OSDs, I know what to script :-)

We will perform some testing in our test environment and I'll try to post
our feedback as long as I don't forget it...

To be sure, we just want to check the size on disk of the object. Afaik,
we'll need to export the rocks db and launch some queries on that, unless
someone else can help me on this one? Before Bluestore this was quite easy
to do...

Regards,
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux