Re: Bluestore with so many small files

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Tue, 23 Apr 2019 16:05:56 +0200 (CEST)

Hi,

You probably forgot to recreate the OSD after changing bluestore_min_alloc_size.

Regards,
Frédéric.

----- Le 22 Avr 19, à 5:41, 刘 俊 <LJshoot@xxxxxxxxxxx> a écrit :

Hi All,
I still see this issue with latest ceph Luminous 12.2.11 and 12.2.12.
I have set bluestore_min_alloc_size = 4096 before the test.
when I write 100000 small objects less than 64KB through rgw, the RAW USED showed in "ceph df" looks incorrect.
For example, I test three times and clean up the rgw data pool each time, the object size for first time is 4KB, for second time is 32KB, for third time is 64KB.
The RAW USED showed in "ceph df" are the same(18GB),  looks like always equal to 64KB*100000/1024*3. (replicator is 3 here)
Any thought?
Jamie
_______________________________________________
Hi Behnam,

On 2/12/2018 4:06 PM, Behnam Loghmani wrote:
> Hi there,
>
> I am using ceph Luminous 12.2.2 with:
>
> 3 osds (each osd is 100G) - no WAL/DB separation.
> 3 mons
> 1 rgw
> cluster size 3
>
> I stored lots of thumbnails with very small size on ceph with radosgw.
>
> Actual size of files is something about 32G but it filled 70G of each osd.
>
> what's the reason of this high disk usage?
Most probably the major reason is BlueStore allocation granularity. E.g. 
an object of 1K bytes length needs 64K of disk space if default 
bluestore_min_alloc_size_hdd  (=64K) is applied.
Additional inconsistency in space reporting might also appear since 
BlueStore adds up DB volume space when accounting total store space. 
While free space is taken from Block device only. is As a result when 
reporting "Used" space always contain that total DB space part ( i.e. 
Used = Total(Block+DB) - Free(Block) ). That correlates to other 
comments in this thread about RockDB space usage.
There is a pending PR to fix that:
https://github.com/ceph/ceph/pull/19454/commits/144fb9663778f833782bdcb16acd707c3ed62a86
You may look for "Bluestore: inaccurate disk usage statistics problem" 
in this mail list for previous discussion as well.

> should I change "bluestore_min_alloc_size_hdd"? and If I change it and 
> set it to smaller size, does it impact on performance?
Unfortunately I haven't benchmark "small writes over hdd" cases much 
hence don't have exacts answer here. Indeed these 'min_alloc_size' 
family of parameters might impact the performance quite significantly.
>
> what is the best practice for storing small files on bluestore?
>
> Best regards,
> Behnam Loghmani

> 
> On Mon, Feb 12, 2018 at 5:06 PM, David Turner <drakonstein at gmail.com 
> <mailto:drakonstein at gmail.com>> wrote:
> 
>     Some of your overhead is the Wal and rocksdb that are on the OSDs.
>     The Wal is pretty static in size, but rocksdb grows with the amount
>     of objects you have. You also have copies of the osdmap on each osd.
>     There's just overhead that adds up. The biggest is going to be
>     rocksdb with how many objects you have.
> 
> 
>     On Mon, Feb 12, 2018, 8:06 AM Behnam Loghmani
>     <behnam.loghmani at gmail.com <mailto:behnam.loghmani at gmail.com>> wrote:
> 
>         Hi there,
> 
>         I am using ceph Luminous 12.2.2 with:
> 
>         3 osds (each osd is 100G) - no WAL/DB separation.
>         3 mons
>         1 rgw
>         cluster size 3
> 
>         I stored lots of thumbnails with very small size on ceph with
>         radosgw.
> 
>         Actual size of files is something about 32G but it filled 70G of
>         each osd.
> 
>         what's the reason of this high disk usage?
>         should I change "bluestore_min_alloc_size_hdd"? and If I change
>         it and set it to smaller size, does it impact on performance?
> 
>         what is the best practice for storing small files on bluestore?
> 
>         Best regards,
>         Behnam Loghmani
>         _______________________________________________
>         ceph-users mailing list
>         ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com