I’m not sure where the doubts about old hardware and pg splits come
from. We observed the opposite of what you seem to fear (increasing
memory usage) after a pg split on a customer’s cluster last year.
According to their Prometheus data the memory usage dropped after the
split had finished. I don’t have too many data sources available, but
we’ve never seen memory issues during pg splits. I would strongly
recommend to consider increasing your pg count.
Zitat von Nicola Mori <mori@xxxxxxxxxx>:
I have a single user producing lots of small files (currently about
4.7M with a mean size of 3 MB). The total number of files is about 7M.
About the occupancy: in 1.8 TiB disks I see the PG count ranging
from 27 (-> 38% occupancy) to 20 (-> 27% occupancy) at the same OSD
weight (1.819). I guess these fluctuations of the number of PGs are
due to the small number of PGs coupled to the inefficiency of the
balancer, do you agree? If it's correct then I see only two ways: a
manual rebalancing (tried in the past with much effort and little
results) or an increase in PG count (risky because of old hardware),
do you see any other possibility?
Cheers,
Nicola
On 02/01/25 5:30 PM, Anthony D'Atri wrote:
On Jan 2, 2025, at 11:18 AM, Nicola Mori <mori@xxxxxxxxxx> wrote:
Hi Anthony, thanks for your insights. I actually used df -h from
the bash shell of a machine mounting the CephFS with the kernel
module, and here's the current result:
wizardfs_rootsquash@b1029256-7bb3-11ec-a8ce-ac1f6b627b45.wizardfs=/ 217T
78T 139T 36% /wizard/ceph
So it seems the fs size is 217 TiB, which is about 66% of the
total amount of raw disk space (320 TiB) as I wrote before.
Then I tried the command you suggested:
# ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 320 TiB 216 TiB 104 TiB 104 TiB 32.56
TOTAL 320 TiB 216 TiB 104 TiB 104 TiB 32.56
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 242 MiB 62 726 MiB 0 62 TiB
wizard_metadata 2 16 1.2 GiB 85.75k 3.5 GiB 0 62 TiB
wizard_data 3 512 78 TiB 27.03M 104 TiB 36.06 138 TiB
In order to find the total size of the data pool I don't
understand how to interpret the "MAX AVAIL" column: should it be
summed to "STORED" or to "USED”?
Do you have a lot of small files?
In the first case I'd get 216 TiB which corresponds to what df -h
says and thus to 66%, in the second case I'd get 242 TiB which is
very close to 75%... But I guess the first option is the right one.
Then I looked at the weights of my failure domain (host):
# ceph osd tree | grep host
-7 25.51636 host aka
-3 25.51636 host balin
-13 29.10950 host bifur
-17 29.10950 host bofur
-21 29.10371 host dwalin
-23 21.83276 host fili
-25 29.10950 host kili
-9 25.51636 host ogion
-19 25.51636 host prestno
-15 29.10522 host remolo
-5 25.51636 host rokanan
-11 27.29063 host romolo
They seem quite even and quite reflecting the actual total size of
each host:
# ceph orch host ls --detail
HOST . . . HDD
aka 9/28.3TB
balin 9/28.3TB
bifur 9/32.5TB
bofur 8/32.0TB
dwalin 16/32.0TB
fili 12/24.0TB
kili 8/32.0TB
ogion 8/28.0TB
prestno 9/28.3TB
remolo 16/32.0TB
rokanan 9/28.5TB
romolo 16/30.0TB
so I see no problem here (in fact, making these even is the idea
behind the disk upgrade strategy I am pursuing).
About the OSD outlier: there seems to be not such an OSD, the
maximum OSD occupancy is 38% and it smoothly decreases down to a
minimum of 27% with no jumps.
That’s a very high variance. If the balancer is working it should
be like +/- 1-2%. Available space in the cluster will be reported
as though all OSDs are 38%.
About PGs: I have 512 PGs in the data pool and 124 OSDs in total,
maybe the count is too low but I'm hesitant to increase it since
my cluster is very low specs and I fear to run out of memory on
the oldest machines.
About CRUSH rules: I don't know exactly what to search for, so if
you believe it's important then I'd need some advice.
Thank you again for your precious help,
Nicola
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Nicola Mori, Ph.D.
INFN sezione di Firenze
Via Bruno Rossi 1, 50019 Sesto F.no (Italy)
+390554572660
mori@xxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx