My feeling is that my problem is due to the small amount for PGs (512 for 104 OSDs): large fluctuations in PG assignment make some small OSDs to have too many PGs, even if properly weighted. For example, currently the most used 500 GB OSD (61% occupancy) has 21 PGs, while the most used 2 TB OSD (41%) has 54 OSDs: the small OSD then has more than 1/3 PGs of the big one, despite being only 1/4 spacious. The overpopulated small OSDs might be the real limiting factor, paired with the OSD distribution (all the 500 GB ones in the same machine) and the host failure domain.
Probably increasing the number of PGs the fluctuations would level out and result in more available space, but being my machines very old and limited (the lowest-specs one s a dual core with 8 GB RAM + 32 GB swap on OSD, managing 8x2TB OSDs) I fear about the increased requirements.
Nicola On 02/04/23 23:08, Christian Wuerdig wrote:
With failure domain host your max usable cluster capacity is essentially constrained by the total capacity of the smallest host which is 8TB if I read the output correctly. You need to balance your hosts better by swapping drives.On Fri, 31 Mar 2023 at 03:34, Nicola Mori <mori@xxxxxxxxxx <mailto:mori@xxxxxxxxxx>> wrote:Dear Ceph users, my cluster is made up of 10 old machines, with uneven number of disks and disk size. Essentially I have just one big data pool (6+2 erasure code, with host failure domain) for which I am currently experiencing a very poor available space (88 TB of which 40 TB occupied, as reported by df -h on hosts mounting the cephfs) compared to the raw one (196.5 TB). I have a total of 104 OSDs and 512 PGs for the pool; I cannot increment the PG number since the machines are old and with very low amount of RAM, and some of them are already overloaded. In this situation I'm seeing a high occupation of small OSDs (500 MB) with respect to bigger ones (2 and 4 TB) even if the weight is set equal to disk capacity (see below for ceph osd tree). For example OSD 9 is at 62% occupancy even with weight 0.5 and reweight 0.75, while the highest occupancy for 2 TB OSDs is 41% (OSD 18) and 4 TB OSDs is 23% (OSD 79). I guess this high occupancy for 500 MB OSDs combined with erasure code size and host failure domain might be the cause of the poor available space, could this be true? The upmap balancer is currently running but I don't know if and how much it could improve the situation. Any hint is greatly appreciated, thanks. Nicola # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 196.47754 root default -7 14.55518 host aka 4 hdd 1.81940 osd.4 up 1.00000 1.00000 11 hdd 1.81940 osd.11 up 1.00000 1.00000 18 hdd 1.81940 osd.18 up 1.00000 1.00000 26 hdd 1.81940 osd.26 up 1.00000 1.00000 32 hdd 1.81940 osd.32 up 1.00000 1.00000 41 hdd 1.81940 osd.41 up 1.00000 1.00000 48 hdd 1.81940 osd.48 up 1.00000 1.00000 55 hdd 1.81940 osd.55 up 1.00000 1.00000 -3 14.55518 host balin 0 hdd 1.81940 osd.0 up 1.00000 1.00000 8 hdd 1.81940 osd.8 up 1.00000 1.00000 15 hdd 1.81940 osd.15 up 1.00000 1.00000 22 hdd 1.81940 osd.22 up 1.00000 1.00000 29 hdd 1.81940 osd.29 up 1.00000 1.00000 34 hdd 1.81940 osd.34 up 1.00000 1.00000 43 hdd 1.81940 osd.43 up 1.00000 1.00000 49 hdd 1.81940 osd.49 up 1.00000 1.00000 -13 29.10950 host bifur 3 hdd 3.63869 osd.3 up 1.00000 1.00000 14 hdd 3.63869 osd.14 up 1.00000 1.00000 27 hdd 3.63869 osd.27 up 1.00000 1.00000 37 hdd 3.63869 osd.37 up 1.00000 1.00000 50 hdd 3.63869 osd.50 up 1.00000 1.00000 59 hdd 3.63869 osd.59 up 1.00000 1.00000 64 hdd 3.63869 osd.64 up 1.00000 1.00000 69 hdd 3.63869 osd.69 up 1.00000 1.00000 -17 29.10950 host bofur 2 hdd 3.63869 osd.2 up 1.00000 1.00000 21 hdd 3.63869 osd.21 up 1.00000 1.00000 39 hdd 3.63869 osd.39 up 1.00000 1.00000 57 hdd 3.63869 osd.57 up 1.00000 1.00000 66 hdd 3.63869 osd.66 up 1.00000 1.00000 72 hdd 3.63869 osd.72 up 1.00000 1.00000 76 hdd 3.63869 osd.76 up 1.00000 1.00000 79 hdd 3.63869 osd.79 up 1.00000 1.00000 -21 29.10376 host dwalin 88 hdd 1.81898 osd.88 up 1.00000 1.00000 89 hdd 1.81898 osd.89 up 1.00000 1.00000 90 hdd 1.81898 osd.90 up 1.00000 1.00000 91 hdd 1.81898 osd.91 up 1.00000 1.00000 92 hdd 1.81898 osd.92 up 1.00000 1.00000 93 hdd 1.81898 osd.93 up 1.00000 1.00000 94 hdd 1.81898 osd.94 up 1.00000 1.00000 95 hdd 1.81898 osd.95 up 1.00000 1.00000 96 hdd 1.81898 osd.96 up 1.00000 1.00000 97 hdd 1.81898 osd.97 up 1.00000 1.00000 98 hdd 1.81898 osd.98 up 1.00000 1.00000 99 hdd 1.81898 osd.99 up 1.00000 1.00000 100 hdd 1.81898 osd.100 up 1.00000 1.00000 101 hdd 1.81898 osd.101 up 1.00000 1.00000 102 hdd 1.81898 osd.102 up 1.00000 1.00000 103 hdd 1.81898 osd.103 up 1.00000 1.00000 -9 14.55518 host ogion 7 hdd 1.81940 osd.7 up 1.00000 1.00000 16 hdd 1.81940 osd.16 up 1.00000 1.00000 23 hdd 1.81940 osd.23 up 1.00000 1.00000 33 hdd 1.81940 osd.33 up 1.00000 1.00000 40 hdd 1.81940 osd.40 up 1.00000 1.00000 47 hdd 1.81940 osd.47 up 1.00000 1.00000 54 hdd 1.81940 osd.54 up 1.00000 1.00000 61 hdd 1.81940 osd.61 up 1.00000 1.00000 -19 14.55518 host prestno 81 hdd 1.81940 osd.81 up 1.00000 1.00000 82 hdd 1.81940 osd.82 up 1.00000 1.00000 83 hdd 1.81940 osd.83 up 1.00000 1.00000 84 hdd 1.81940 osd.84 up 1.00000 1.00000 85 hdd 1.81940 osd.85 up 1.00000 1.00000 86 hdd 1.81940 osd.86 up 1.00000 1.00000 87 hdd 1.81940 osd.87 up 1.00000 1.00000 104 hdd 1.81940 osd.104 up 1.00000 1.00000 -15 29.10376 host remolo 6 hdd 1.81897 osd.6 up 1.00000 1.00000 12 hdd 1.81897 osd.12 up 1.00000 1.00000 19 hdd 1.81897 osd.19 up 1.00000 1.00000 28 hdd 1.81897 osd.28 up 1.00000 1.00000 35 hdd 1.81897 osd.35 up 1.00000 1.00000 44 hdd 1.81897 osd.44 up 1.00000 1.00000 52 hdd 1.81897 osd.52 up 1.00000 1.00000 58 hdd 1.81897 osd.58 up 1.00000 1.00000 63 hdd 1.81897 osd.63 up 1.00000 1.00000 67 hdd 1.81897 osd.67 up 1.00000 1.00000 71 hdd 1.81897 osd.71 up 1.00000 1.00000 73 hdd 1.81897 osd.73 up 1.00000 1.00000 74 hdd 1.81897 osd.74 up 1.00000 1.00000 75 hdd 1.81897 osd.75 up 1.00000 1.00000 77 hdd 1.81897 osd.77 up 1.00000 1.00000 78 hdd 1.81897 osd.78 up 1.00000 1.00000 -5 14.55518 host rokanan 1 hdd 1.81940 osd.1 up 1.00000 1.00000 10 hdd 1.81940 osd.10 up 1.00000 1.00000 17 hdd 1.81940 osd.17 up 1.00000 1.00000 24 hdd 1.81940 osd.24 up 1.00000 1.00000 31 hdd 1.81940 osd.31 up 1.00000 1.00000 38 hdd 1.81940 osd.38 up 1.00000 1.00000 46 hdd 1.81940 osd.46 up 1.00000 1.00000 53 hdd 1.81940 osd.53 up 1.00000 1.00000 -11 7.27515 host romolo 5 hdd 0.45470 osd.5 up 1.00000 1.00000 9 hdd 0.45470 osd.9 up 0.75000 1.00000 13 hdd 0.45470 osd.13 up 1.00000 1.00000 20 hdd 0.45470 osd.20 up 0.95000 1.00000 25 hdd 0.45470 osd.25 up 0.75000 1.00000 30 hdd 0.45470 osd.30 up 1.00000 1.00000 36 hdd 0.45470 osd.36 up 1.00000 1.00000 42 hdd 0.45470 osd.42 up 1.00000 1.00000 45 hdd 0.45470 osd.45 up 0.85004 1.00000 51 hdd 0.45470 osd.51 up 0.89999 1.00000 56 hdd 0.45470 osd.56 up 1.00000 1.00000 60 hdd 0.45470 osd.60 up 1.00000 1.00000 62 hdd 0.45470 osd.62 up 1.00000 1.00000 65 hdd 0.45470 osd.65 up 0.85004 1.00000 68 hdd 0.45470 osd.68 up 1.00000 1.00000 70 hdd 0.45470 osd.70 up 1.00000 1.00000 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
-- Nicola Mori, Ph.D. INFN sezione di Firenze Via Bruno Rossi 1, 50019 Sesto F.no (Italy) +390554572660 mori@xxxxxxxxxx
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx