Re: Raw use 10 times higher than data use

Andrei Mikhailovsky <andrei@xxxxxxxxxx> · Fri, 27 Sep 2019 12:48:59 +0100 (BST)

Hi Mark, thanks for coming back

regarding the small objects which are under the min_alloc size. I am sure there are plenty of such objects as the rgw has backups of windows pcs/servers which are not compressed. Could you please confirm something for me. When I do the "radosgw-admin bucket stats" command and check the bucket usage, does the reported usage show the usage on the osd or simply the cumulative size of files stored in the bucket. For example, if I store a single file the size of 2 bytes, will it show 2 bytes or will it show the min_alloc size?

I was judging the space usage based on the output of the bucket stats command and compared it with the rbd df output.

Thanks

----- Original Message -----
> From: "Mark Nelson" <mnelson@xxxxxxxxxx>
> To: "ceph-users" <ceph-users@xxxxxxx>
> Sent: Thursday, 26 September, 2019 17:52:37
> Subject:  Re: Raw use 10 times higher than data use

> Hi Andrei,
> 
> 
> Probably the first thing to check is if you have objects that are under
> the min_alloc size.  Those objects will result in wasted space as they
> will use the full min_alloc size.  IE by default a 1K RGW object on HDD
> will take 64KB, while on NVMe it will take 16KB.  We are considering
> possibly setting the min_alloc size in master to 4K now that we've
> improved performance of the write path, but there is a trade-off as this
> will result in more rocksdb metadata and likely more overhead as the DB
> grows.  We still have testing we need to perform to see if it's a good
> idea as a default value.  We are also considering inlining very small
> (<4K) objects in the onode itself, but that also will require
> significant testing as it may put additional load on the DB as well.
> 
> 
> Mark
> 
> On 9/26/19 4:58 AM, Andrei Mikhailovsky wrote:
>> Hi Georg,
>>
>> I am having a similar issue with the RGW pool. However, not to the extent of 10x
>> error rate. In my case, the error rate is a bout 2-3x. My real data usage is
>> around 6TB, but Ceph uses over 17TB. I have asked this question here, but no
>> one seems to know the solution and how to go about finding the wasted space and
>> clearing it.
>>
>> @ceph_guys - does anyone in the company work in the area of finding the bugs
>> that relate to the wasted space? Anyone could assist us in debugging the fixing
>> our issues?
>>
>> Thanks
>>
>> Andrei
>>
>> ----- Original Message -----
>>> From: "Georg F" <georg@xxxxxxxx>
>>> To: ceph-users@xxxxxxx
>>> Sent: Thursday, 26 September, 2019 10:50:01
>>> Subject:  Raw use 10 times higher than data use
>>> Hi all,
>>>
>>> I've recently moved a 1TiB pool (3TiB raw use) from hdd osds (7) to newly added
>>> nvme osds (14). The hdd osds should be almost empty by now as just small pools
>>> reside on them. The pools on the hdd osds in sum store about 25GiB, which
>>> should use about 75GiB with a pool size of 3. Wal and db are on separate
>>> devices.
>>>
>>> However the outputs of ceph df and ceph osd df tell a different story:
>>>
>>> # ceph df
>>> RAW STORAGE:
>>>     CLASS     SIZE       AVAIL      USED        RAW USED     %RAW USED
>>>     hdd       19 TiB     18 TiB     775 GiB      782 GiB          3.98
>>>
>>> # ceph osd df | egrep "(ID|hdd)"
>>> ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE
>>> VAR  PGS STATUS
>>> 8   hdd 2.72392  1.00000 2.8 TiB 111 GiB  10 GiB 111 KiB 1024 MiB 2.7 TiB 3.85
>>> 0.60  65     up
>>> 6   hdd 2.17914  1.00000 2.3 TiB 112 GiB  11 GiB  83 KiB 1024 MiB 2.2 TiB 4.82
>>> 0.75  58     up
>>> 3   hdd 2.72392  1.00000 2.8 TiB 114 GiB  13 GiB  71 KiB 1024 MiB 2.7 TiB 3.94
>>> 0.62  76     up
>>> 5   hdd 2.72392  1.00000 2.8 TiB 109 GiB 7.6 GiB  83 KiB 1024 MiB 2.7 TiB 3.76
>>> 0.59  63     up
>>> 4   hdd 2.72392  1.00000 2.8 TiB 112 GiB  11 GiB  55 KiB 1024 MiB 2.7 TiB 3.87
>>> 0.60  59     up
>>> 7   hdd 2.72392  1.00000 2.8 TiB 114 GiB  13 GiB   8 KiB 1024 MiB 2.7 TiB 3.93
>>> 0.61  66     up
>>> 2   hdd 2.72392  1.00000 2.8 TiB 111 GiB 9.9 GiB  78 KiB 1024 MiB 2.7 TiB 3.84
>>> 0.60  69     up
>>>
>>> The sum of "DATA" is 75,5GiB which is what I am expecting to be used by the
>>> pools. How come the sum of "RAW USE" is 783GiB? More than 10x the size of the
>>> stored data. On my nvme osds the "RAW USE" to "DATA" overhead is <1%:
>>>
>>> ceph osd df|egrep "(ID|nvme)"
>>> ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE
>>> VAR  PGS STATUS
>>> 0  nvme 2.61989  1.00000 2.6 TiB 181 GiB 180 GiB  31 KiB  1.0 GiB 2.4 TiB 6.74
>>> 1.05  12     up
>>> 1  nvme 2.61989  1.00000 2.6 TiB 151 GiB 150 GiB  39 KiB 1024 MiB 2.5 TiB 5.62
>>> 0.88  10     up
>>> 13  nvme 2.61989  1.00000 2.6 TiB 239 GiB 238 GiB  55 KiB  1.0 GiB 2.4 TiB 8.89
>>> 1.39  16     up
>>> -- truncated --
>>>
>>> I am running ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39)
>>> nautilus (stable) which was upgraded recently from 13.2.1.
>>>
>>> Any help is appreciated.
>>>
>>> Best regards,
>>> Georg
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx