From: "Igor Fedotov" <ifedotov@xxxxxxx>
To: "andrei" <andrei@xxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Wednesday, 3 July, 2019 13:49:02
Subject: Re: troubleshooting space usage
Looks fine - comparing bluestore_allocated vs. bluestore_stored shows a little difference. So that's not the allocation overhead.
What's about comparing object counts reported by ceph and radosgw tools?
Igor.
On 7/3/2019 3:25 PM, Andrei Mikhailovsky wrote:
Thanks Igor, Here is a link to the ceph perf data on several osds.
In terms of the object sizes. We use rgw to backup the data from various workstations and servers. So, the sizes would be from a few kb to a few gig per individual file.
Cheers
From: "Igor Fedotov" <ifedotov@xxxxxxx>
To: "andrei" <andrei@xxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Wednesday, 3 July, 2019 12:29:33
Subject: Re: troubleshooting space usage
Hi Andrei,
Additionally I'd like to see performance counters dump for a couple of HDD OSDs (obtained through 'ceph daemon osd.N perf dump' command).
W.r.t average object size - I was thinking that you might know what objects had been uploaded... If not then you might want to estimate it by using "rados get" command on the pool: retrieve some random object set and check their sizes. But let's check performance counters first - most probably they will show loses caused by allocation.
Also I've just found similar issue (still unresolved) in our internal tracker - but its root cause is definitely different from allocation overhead. Looks like some orphaned objects in the pool. Could you please compare and share the amounts of objects in the pool reported by "ceph (or rados) df detail" and radosgw tools?
Thanks,
Igor
On 7/3/2019 12:56 PM, Andrei Mikhailovsky wrote:
Hi Igor,
Many thanks for your reply. Here are the details about the cluster:
1. Ceph version - 13.2.5-1xenial (installed from Ceph repository for ubuntu 16.04)
2. main devices for radosgw pool - hdd. we do use a few ssds for the other pool, but it is not used by radosgw
3. we use BlueStore
4. Average rgw object size - I have no idea how to check that. Couldn't find a simple answer from google either. Could you please let me know how to check that?
5. Ceph osd df tree:
6. Other useful info on the cluster:
# ceph osd df treeID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME
-1 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - root uk-5 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - datacenter ldex-11 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - room ldex-dc3-13 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - row row-a-4 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - rack ldex-rack-a5-2 28.04495 - 28 TiB 22 TiB 6.2 TiB 77.96 0.98 - host arh-ibstorage1-ib
0 hdd 2.73000 0.79999 2.8 TiB 2.3 TiB 519 GiB 81.61 1.03 145 osd.01 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 847 GiB 70.00 0.88 130 osd.12 hdd 2.73000 1.00000 2.8 TiB 2.2 TiB 561 GiB 80.12 1.01 152 osd.23 hdd 2.73000 1.00000 2.8 TiB 2.3 TiB 469 GiB 83.41 1.05 160 osd.34 hdd 2.73000 1.00000 2.8 TiB 1.8 TiB 983 GiB 65.18 0.82 141 osd.432 hdd 5.45999 1.00000 5.5 TiB 4.4 TiB 1.1 TiB 80.68 1.02 306 osd.3235 hdd 2.73000 1.00000 2.8 TiB 1.7 TiB 1.0 TiB 62.89 0.79 126 osd.3536 hdd 2.73000 1.00000 2.8 TiB 2.3 TiB 464 GiB 83.58 1.05 175 osd.3637 hdd 2.73000 0.89999 2.8 TiB 2.5 TiB 301 GiB 89.34 1.13 160 osd.375 ssd 0.74500 1.00000 745 GiB 642 GiB 103 GiB 86.15 1.09 65 osd.5
-3 28.04495 - 28 TiB 24 TiB 4.5 TiB 84.03 1.06 - host arh-ibstorage2-ib9 hdd 2.73000 0.95000 2.8 TiB 2.4 TiB 405 GiB 85.65 1.08 158 osd.910 hdd 2.73000 0.89999 2.8 TiB 2.4 TiB 352 GiB 87.52 1.10 169 osd.1011 hdd 2.73000 1.00000 2.8 TiB 2.0 TiB 783 GiB 72.28 0.91 160 osd.1112 hdd 2.73000 0.84999 2.8 TiB 2.4 TiB 359 GiB 87.27 1.10 153 osd.1213 hdd 2.73000 1.00000 2.8 TiB 2.4 TiB 348 GiB 87.69 1.11 169 osd.1314 hdd 2.73000 1.00000 2.8 TiB 2.5 TiB 283 GiB 89.97 1.14 170 osd.1415 hdd 2.73000 1.00000 2.8 TiB 2.2 TiB 560 GiB 80.18 1.01 155 osd.1516 hdd 2.73000 0.95000 2.8 TiB 2.4 TiB 332 GiB 88.26 1.11 178 osd.1626 hdd 5.45999 1.00000 5.5 TiB 4.4 TiB 1.0 TiB 81.04 1.02 324 osd.267 ssd 0.74500 1.00000 745 GiB 607 GiB 138 GiB 81.48 1.03 62 osd.7
-15 28.04495 - 28 TiB 22 TiB 6.4 TiB 77.40 0.98 - host arh-ibstorage3-ib18 hdd 2.73000 0.95000 2.8 TiB 2.5 TiB 312 GiB 88.96 1.12 156 osd.1819 hdd 2.73000 1.00000 2.8 TiB 2.0 TiB 771 GiB 72.68 0.92 162 osd.1920 hdd 2.73000 1.00000 2.8 TiB 2.0 TiB 733 GiB 74.04 0.93 149 osd.2021 hdd 2.73000 1.00000 2.8 TiB 2.2 TiB 533 GiB 81.12 1.02 155 osd.2122 hdd 2.73000 1.00000 2.8 TiB 2.1 TiB 692 GiB 75.48 0.95 144 osd.2223 hdd 2.73000 1.00000 2.8 TiB 1.6 TiB 1.1 TiB 58.43 0.74 130 osd.2324 hdd 2.73000 1.00000 2.8 TiB 2.2 TiB 579 GiB 79.51 1.00 146 osd.2425 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 886 GiB 68.63 0.87 147 osd.2531 hdd 5.45999 1.00000 5.5 TiB 4.7 TiB 758 GiB 86.50 1.09 326 osd.316 ssd 0.74500 0.89999 744 GiB 640 GiB 104 GiB 86.01 1.09 61 osd.6
-17 28.04494 - 28 TiB 22 TiB 6.3 TiB 77.61 0.98 - host arh-ibstorage4-ib8 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 909 GiB 67.80 0.86 141 osd.817 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 904 GiB 67.99 0.86 144 osd.1727 hdd 2.73000 1.00000 2.8 TiB 2.1 TiB 654 GiB 76.84 0.97 152 osd.2728 hdd 2.73000 1.00000 2.8 TiB 2.3 TiB 481 GiB 82.98 1.05 153 osd.2829 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 829 GiB 70.65 0.89 137 osd.2930 hdd 2.73000 1.00000 2.8 TiB 2.0 TiB 762 GiB 73.03 0.92 142 osd.3033 hdd 2.73000 1.00000 2.8 TiB 2.3 TiB 501 GiB 82.25 1.04 166 osd.3334 hdd 5.45998 1.00000 5.5 TiB 4.5 TiB 968 GiB 82.77 1.04 325 osd.3439 hdd 2.73000 0.95000 2.8 TiB 2.4 TiB 402 GiB 85.77 1.08 162 osd.3938 ssd 0.74500 1.00000 745 GiB 671 GiB 74 GiB 90.02 1.14 68 osd.38TOTAL 113 TiB 90 TiB 23 TiB 79.25MIN/MAX VAR: 0.74/1.14 STDDEV: 8.14
# for i in $(radosgw-admin bucket list | jq -r '.[]'); do radosgw-admin bucket stats --bucket=$i | jq '.usage | ."rgw.main" | .size_kb' ; done | awk '{ SUM += $1} END { print SUM/1024/1024/1024 }'6.59098
# ceph df
GLOBAL:SIZE AVAIL RAW USED %RAW USED113 TiB 23 TiB 90 TiB 79.25
POOLS:NAME ID USED %USED MAX AVAIL OBJECTSPrimary-ubuntu-1 5 27 TiB 87.56 3.9 TiB 7302534.users.uid 15 6.8 KiB 0 3.9 TiB 39.users 16 335 B 0 3.9 TiB 20.users.swift 17 14 B 0 3.9 TiB 1.rgw.buckets 19 15 TiB 79.88 3.9 TiB 8787763.users.email 22 0 B 0 3.9 TiB 0.log 24 109 MiB 0 3.9 TiB 102301.rgw.buckets.extra 37 0 B 0 2.6 TiB 0.rgw.root 44 2.9 KiB 0 2.6 TiB 16.rgw.meta 45 1.7 MiB 0 2.6 TiB 6249.rgw.control 46 0 B 0 2.6 TiB 8.rgw.gc 47 0 B 0 2.6 TiB 32.usage 52 0 B 0 2.6 TiB 0.intent-log 53 0 B 0 2.6 TiB 0default.rgw.buckets.non-ec 54 0 B 0 2.6 TiB 0.rgw.buckets.index 55 0 B 0 2.6 TiB 11485.rgw 56 491 KiB 0 2.6 TiB 1686Primary-ubuntu-1-ssd 57 1.2 TiB 92.39 105 GiB 379516
I am not too sure if the issue relates to the BlueStore overhead as I would probably have seen the discrepancy in my Primary-ubuntu-1 pool as well. However, the data usage on Primary-ubuntu-1 pool seems to be consistent with my expectations (precise numbers to be verified soon). The issues seems to be only with the .rgw-buckets pool where the "ceph df " output shows 15TB of usage and the sum of all buckets in that pool shows just over 6.5TB.
Cheers
Andrei
From: "Igor Fedotov" <ifedotov@xxxxxxx>
To: "andrei" <andrei@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Tuesday, 2 July, 2019 10:58:54
Subject: Re: troubleshooting space usage
Hi Andrei,
The most obvious reason is space usage overhead caused by BlueStore allocation granularity, e.g. if bluestore_min_alloc_size is 64K and average object size is 16K one will waste 48K per object in average. This is rather a speculation so far as we lack key the information about your cluster:
- Ceph version
- What are the main devices for OSD: hdd or ssd.
- BlueStore or FileStore.
- average RGW object size.
You might also want to collect and share performance counter dumps (ceph daemon osd.N perf dump) and "
" reports from a couple of your OSDs.
Thanks,
Igor
On 7/2/2019 11:43 AM, Andrei Mikhailovsky wrote:
Bump!
From: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Friday, 28 June, 2019 14:54:53
Subject: troubleshooting space usage
Hi
Could someone please explain / show how to troubleshoot the space usage in Ceph and how to reclaim the unused space?
I have a small cluster with 40 osds, replica of 2, mainly used as a backend for cloud stack as well as the S3 gateway. The used space doesn't make any sense to me, especially the rgw pool, so I am seeking help.
Here is what I found from the client:
Ceph -s shows the
usage: 89 TiB used, 24 TiB / 113 TiB avail
Ceph df shows:
Primary-ubuntu-1 5 27 TiB 90.11 3.0 TiB 7201098Primary-ubuntu-1-ssd 57 1.2 TiB 89.62 143 GiB 359260.rgw.buckets 19 15 TiB 83.73 3.0 TiB 8742222
the usage of the Primary-ubuntu-1 and Primary-ubuntu-1-ssd is in line with my expectations. However, the .rgw.buckets pool seems to be using way too much. The usage of all rgw buckets shows 6.5TB usage (looking at the size_kb values from the "radosgw-admin bucket stats"). I am trying to figure out why .rgw.buckets is using 15TB of space instead of the 6.5TB as shown from the bucket usage.
Thanks
Andrei
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com