Hi Andrei,
Probably the first thing to check is if you have objects that are under
the min_alloc size. Those objects will result in wasted space as they
will use the full min_alloc size. IE by default a 1K RGW object on HDD
will take 64KB, while on NVMe it will take 16KB. We are considering
possibly setting the min_alloc size in master to 4K now that we've
improved performance of the write path, but there is a trade-off as this
will result in more rocksdb metadata and likely more overhead as the DB
grows. We still have testing we need to perform to see if it's a good
idea as a default value. We are also considering inlining very small
(<4K) objects in the onode itself, but that also will require
significant testing as it may put additional load on the DB as well.
Mark
On 9/26/19 4:58 AM, Andrei Mikhailovsky wrote:
Hi Georg,
I am having a similar issue with the RGW pool. However, not to the extent of 10x error rate. In my case, the error rate is a bout 2-3x. My real data usage is around 6TB, but Ceph uses over 17TB. I have asked this question here, but no one seems to know the solution and how to go about finding the wasted space and clearing it.
@ceph_guys - does anyone in the company work in the area of finding the bugs that relate to the wasted space? Anyone could assist us in debugging the fixing our issues?
Thanks
Andrei
----- Original Message -----
From: "Georg F" <georg@xxxxxxxx>
To: ceph-users@xxxxxxx
Sent: Thursday, 26 September, 2019 10:50:01
Subject: Raw use 10 times higher than data use
Hi all,
I've recently moved a 1TiB pool (3TiB raw use) from hdd osds (7) to newly added
nvme osds (14). The hdd osds should be almost empty by now as just small pools
reside on them. The pools on the hdd osds in sum store about 25GiB, which
should use about 75GiB with a pool size of 3. Wal and db are on separate
devices.
However the outputs of ceph df and ceph osd df tell a different story:
# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 19 TiB 18 TiB 775 GiB 782 GiB 3.98
# ceph osd df | egrep "(ID|hdd)"
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE
VAR PGS STATUS
8 hdd 2.72392 1.00000 2.8 TiB 111 GiB 10 GiB 111 KiB 1024 MiB 2.7 TiB 3.85
0.60 65 up
6 hdd 2.17914 1.00000 2.3 TiB 112 GiB 11 GiB 83 KiB 1024 MiB 2.2 TiB 4.82
0.75 58 up
3 hdd 2.72392 1.00000 2.8 TiB 114 GiB 13 GiB 71 KiB 1024 MiB 2.7 TiB 3.94
0.62 76 up
5 hdd 2.72392 1.00000 2.8 TiB 109 GiB 7.6 GiB 83 KiB 1024 MiB 2.7 TiB 3.76
0.59 63 up
4 hdd 2.72392 1.00000 2.8 TiB 112 GiB 11 GiB 55 KiB 1024 MiB 2.7 TiB 3.87
0.60 59 up
7 hdd 2.72392 1.00000 2.8 TiB 114 GiB 13 GiB 8 KiB 1024 MiB 2.7 TiB 3.93
0.61 66 up
2 hdd 2.72392 1.00000 2.8 TiB 111 GiB 9.9 GiB 78 KiB 1024 MiB 2.7 TiB 3.84
0.60 69 up
The sum of "DATA" is 75,5GiB which is what I am expecting to be used by the
pools. How come the sum of "RAW USE" is 783GiB? More than 10x the size of the
stored data. On my nvme osds the "RAW USE" to "DATA" overhead is <1%:
ceph osd df|egrep "(ID|nvme)"
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE
VAR PGS STATUS
0 nvme 2.61989 1.00000 2.6 TiB 181 GiB 180 GiB 31 KiB 1.0 GiB 2.4 TiB 6.74
1.05 12 up
1 nvme 2.61989 1.00000 2.6 TiB 151 GiB 150 GiB 39 KiB 1024 MiB 2.5 TiB 5.62
0.88 10 up
13 nvme 2.61989 1.00000 2.6 TiB 239 GiB 238 GiB 55 KiB 1.0 GiB 2.4 TiB 8.89
1.39 16 up
-- truncated --
I am running ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39)
nautilus (stable) which was upgraded recently from 13.2.1.
Any help is appreciated.
Best regards,
Georg
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx