Re: Raw use 10 times higher than data use

Mark Nelson <mnelson@xxxxxxxxxx> · Thu, 26 Sep 2019 11:52:37 -0500

Hi Andrei,

Probably the first thing to check is if you have objects that are under 
the min_alloc size.  Those objects will result in wasted space as they 
will use the full min_alloc size.  IE by default a 1K RGW object on HDD 
will take 64KB, while on NVMe it will take 16KB.  We are considering 
possibly setting the min_alloc size in master to 4K now that we've 
improved performance of the write path, but there is a trade-off as this 
will result in more rocksdb metadata and likely more overhead as the DB 
grows.  We still have testing we need to perform to see if it's a good 
idea as a default value.  We are also considering inlining very small 
(<4K) objects in the onode itself, but that also will require 
significant testing as it may put additional load on the DB as well.

Mark

On 9/26/19 4:58 AM, Andrei Mikhailovsky wrote:
Hi Georg,

I am having a similar issue with the RGW pool. However, not to the extent of 10x error rate. In my case, the error rate is a bout 2-3x. My real data usage is around 6TB, but Ceph uses over 17TB. I have asked this question here, but no one seems to know the solution and how to go about finding the wasted space and clearing it.

@ceph_guys - does anyone in the company work in the area of finding the bugs that relate to the wasted space? Anyone could assist us in debugging the fixing our issues?

Thanks

Andrei

----- Original Message -----
From: "Georg F" <georg@xxxxxxxx>
To: ceph-users@xxxxxxx
Sent: Thursday, 26 September, 2019 10:50:01
Subject:  Raw use 10 times higher than data use
Hi all,

I've recently moved a 1TiB pool (3TiB raw use) from hdd osds (7) to newly added
nvme osds (14). The hdd osds should be almost empty by now as just small pools
reside on them. The pools on the hdd osds in sum store about 25GiB, which
should use about 75GiB with a pool size of 3. Wal and db are on separate
devices.

However the outputs of ceph df and ceph osd df tell a different story:

# ceph df
RAW STORAGE:
    CLASS     SIZE       AVAIL      USED        RAW USED     %RAW USED
    hdd       19 TiB     18 TiB     775 GiB      782 GiB          3.98

# ceph osd df | egrep "(ID|hdd)"
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE
VAR  PGS STATUS
8   hdd 2.72392  1.00000 2.8 TiB 111 GiB  10 GiB 111 KiB 1024 MiB 2.7 TiB 3.85
0.60  65     up
6   hdd 2.17914  1.00000 2.3 TiB 112 GiB  11 GiB  83 KiB 1024 MiB 2.2 TiB 4.82
0.75  58     up
3   hdd 2.72392  1.00000 2.8 TiB 114 GiB  13 GiB  71 KiB 1024 MiB 2.7 TiB 3.94
0.62  76     up
5   hdd 2.72392  1.00000 2.8 TiB 109 GiB 7.6 GiB  83 KiB 1024 MiB 2.7 TiB 3.76
0.59  63     up
4   hdd 2.72392  1.00000 2.8 TiB 112 GiB  11 GiB  55 KiB 1024 MiB 2.7 TiB 3.87
0.60  59     up
7   hdd 2.72392  1.00000 2.8 TiB 114 GiB  13 GiB   8 KiB 1024 MiB 2.7 TiB 3.93
0.61  66     up
2   hdd 2.72392  1.00000 2.8 TiB 111 GiB 9.9 GiB  78 KiB 1024 MiB 2.7 TiB 3.84
0.60  69     up

The sum of "DATA" is 75,5GiB which is what I am expecting to be used by the
pools. How come the sum of "RAW USE" is 783GiB? More than 10x the size of the
stored data. On my nvme osds the "RAW USE" to "DATA" overhead is <1%:

ceph osd df|egrep "(ID|nvme)"
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE
VAR  PGS STATUS
0  nvme 2.61989  1.00000 2.6 TiB 181 GiB 180 GiB  31 KiB  1.0 GiB 2.4 TiB 6.74
1.05  12     up
1  nvme 2.61989  1.00000 2.6 TiB 151 GiB 150 GiB  39 KiB 1024 MiB 2.5 TiB 5.62
0.88  10     up
13  nvme 2.61989  1.00000 2.6 TiB 239 GiB 238 GiB  55 KiB  1.0 GiB 2.4 TiB 8.89
1.39  16     up
-- truncated --

I am running ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39)
nautilus (stable) which was upgraded recently from 13.2.1.

Any help is appreciated.

Best regards,
Georg
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx