Re: Eccessive occupation of small OSDs

Nicola Mori <mori@xxxxxxxxxx> · Mon, 3 Apr 2023 08:34:45 +0200

Hi Christian, I understand what you say, but in my understanding a small 
capacity OSD should be properly weighted so that fewer PGs are allocated 
on it and then the bulk of the data should reside on other, bigger OSDs. 
In my specific case I also have more hosts (10) than shards (8), so a 
single small host should not severely constrain the overall capacity 
since 8 shards can be allocated on 8 hosts without the small one being 
necessarily one of these.

My feeling is that my problem is due to the small amount for PGs (512 
for 104 OSDs): large fluctuations in PG assignment make some small OSDs 
to have too many PGs, even if properly weighted. For example, currently 
the most used 500 GB OSD (61% occupancy) has 21 PGs, while the most used 
2 TB OSD (41%) has 54 OSDs: the small OSD then has more than 1/3 PGs of 
the big one, despite being only 1/4 spacious. The overpopulated small 
OSDs might be the real limiting factor, paired with the OSD distribution 
(all the 500 GB ones in the same machine) and the host failure domain.

Probably increasing the number of PGs the fluctuations would level out 
and result in more available space, but being my machines very old and 
limited (the lowest-specs one s a dual core with 8 GB RAM + 32 GB swap 
on OSD, managing 8x2TB OSDs) I fear about the increased requirements.

Nicola

On 02/04/23 23:08, Christian Wuerdig wrote:
With failure domain host your max usable cluster capacity is essentially 
constrained by the total capacity of the smallest host which is 8TB if I 
read the output correctly. You need to balance your hosts better by 
swapping drives.

On Fri, 31 Mar 2023 at 03:34, Nicola Mori <mori@xxxxxxxxxx 
<mailto:mori@xxxxxxxxxx>> wrote:

    Dear Ceph users,

    my cluster is made up of 10 old machines, with uneven number of
    disks and disk size. Essentially I have just one big data pool (6+2
    erasure code, with host failure domain) for which I am currently
    experiencing a very poor available space (88 TB of which 40 TB
    occupied, as reported by df -h on hosts mounting the cephfs)
    compared to the raw one (196.5 TB). I have a total of 104 OSDs and
    512 PGs for the pool; I cannot increment the PG number since the
    machines are old and with very low amount of RAM, and some of them
    are already overloaded.

    In this situation I'm seeing a high occupation of small OSDs (500
    MB) with respect to bigger ones (2 and 4 TB) even if the weight is
    set equal to disk capacity (see below for ceph osd tree). For
    example OSD 9 is at 62% occupancy even with weight 0.5 and reweight
    0.75, while the highest occupancy for 2 TB OSDs is 41% (OSD 18) and
    4 TB OSDs is 23% (OSD 79). I guess this high occupancy for 500 MB
    OSDs combined with erasure code size and host failure domain might
    be the cause of the poor available space, could this be true? The
    upmap balancer is currently running but I don't know if and how much
    it could improve the situation.
    Any hint is greatly appreciated, thanks.

    Nicola

    # ceph osd tree
    ID   CLASS  WEIGHT     TYPE NAME         STATUS  REWEIGHT  PRI-AFF
      -1         196.47754  root default
      -7          14.55518      host aka
       4    hdd    1.81940          osd.4         up   1.00000  1.00000
      11    hdd    1.81940          osd.11        up   1.00000  1.00000
      18    hdd    1.81940          osd.18        up   1.00000  1.00000
      26    hdd    1.81940          osd.26        up   1.00000  1.00000
      32    hdd    1.81940          osd.32        up   1.00000  1.00000
      41    hdd    1.81940          osd.41        up   1.00000  1.00000
      48    hdd    1.81940          osd.48        up   1.00000  1.00000
      55    hdd    1.81940          osd.55        up   1.00000  1.00000
      -3          14.55518      host balin
       0    hdd    1.81940          osd.0         up   1.00000  1.00000
       8    hdd    1.81940          osd.8         up   1.00000  1.00000
      15    hdd    1.81940          osd.15        up   1.00000  1.00000
      22    hdd    1.81940          osd.22        up   1.00000  1.00000
      29    hdd    1.81940          osd.29        up   1.00000  1.00000
      34    hdd    1.81940          osd.34        up   1.00000  1.00000
      43    hdd    1.81940          osd.43        up   1.00000  1.00000
      49    hdd    1.81940          osd.49        up   1.00000  1.00000
    -13          29.10950      host bifur
       3    hdd    3.63869          osd.3         up   1.00000  1.00000
      14    hdd    3.63869          osd.14        up   1.00000  1.00000
      27    hdd    3.63869          osd.27        up   1.00000  1.00000
      37    hdd    3.63869          osd.37        up   1.00000  1.00000
      50    hdd    3.63869          osd.50        up   1.00000  1.00000
      59    hdd    3.63869          osd.59        up   1.00000  1.00000
      64    hdd    3.63869          osd.64        up   1.00000  1.00000
      69    hdd    3.63869          osd.69        up   1.00000  1.00000
    -17          29.10950      host bofur
       2    hdd    3.63869          osd.2         up   1.00000  1.00000
      21    hdd    3.63869          osd.21        up   1.00000  1.00000
      39    hdd    3.63869          osd.39        up   1.00000  1.00000
      57    hdd    3.63869          osd.57        up   1.00000  1.00000
      66    hdd    3.63869          osd.66        up   1.00000  1.00000
      72    hdd    3.63869          osd.72        up   1.00000  1.00000
      76    hdd    3.63869          osd.76        up   1.00000  1.00000
      79    hdd    3.63869          osd.79        up   1.00000  1.00000
    -21          29.10376      host dwalin
      88    hdd    1.81898          osd.88        up   1.00000  1.00000
      89    hdd    1.81898          osd.89        up   1.00000  1.00000
      90    hdd    1.81898          osd.90        up   1.00000  1.00000
      91    hdd    1.81898          osd.91        up   1.00000  1.00000
      92    hdd    1.81898          osd.92        up   1.00000  1.00000
      93    hdd    1.81898          osd.93        up   1.00000  1.00000
      94    hdd    1.81898          osd.94        up   1.00000  1.00000
      95    hdd    1.81898          osd.95        up   1.00000  1.00000
      96    hdd    1.81898          osd.96        up   1.00000  1.00000
      97    hdd    1.81898          osd.97        up   1.00000  1.00000
      98    hdd    1.81898          osd.98        up   1.00000  1.00000
      99    hdd    1.81898          osd.99        up   1.00000  1.00000
    100    hdd    1.81898          osd.100       up   1.00000  1.00000
    101    hdd    1.81898          osd.101       up   1.00000  1.00000
    102    hdd    1.81898          osd.102       up   1.00000  1.00000
    103    hdd    1.81898          osd.103       up   1.00000  1.00000
      -9          14.55518      host ogion
       7    hdd    1.81940          osd.7         up   1.00000  1.00000
      16    hdd    1.81940          osd.16        up   1.00000  1.00000
      23    hdd    1.81940          osd.23        up   1.00000  1.00000
      33    hdd    1.81940          osd.33        up   1.00000  1.00000
      40    hdd    1.81940          osd.40        up   1.00000  1.00000
      47    hdd    1.81940          osd.47        up   1.00000  1.00000
      54    hdd    1.81940          osd.54        up   1.00000  1.00000
      61    hdd    1.81940          osd.61        up   1.00000  1.00000
    -19          14.55518      host prestno
      81    hdd    1.81940          osd.81        up   1.00000  1.00000
      82    hdd    1.81940          osd.82        up   1.00000  1.00000
      83    hdd    1.81940          osd.83        up   1.00000  1.00000
      84    hdd    1.81940          osd.84        up   1.00000  1.00000
      85    hdd    1.81940          osd.85        up   1.00000  1.00000
      86    hdd    1.81940          osd.86        up   1.00000  1.00000
      87    hdd    1.81940          osd.87        up   1.00000  1.00000
    104    hdd    1.81940          osd.104       up   1.00000  1.00000
    -15          29.10376      host remolo
       6    hdd    1.81897          osd.6         up   1.00000  1.00000
      12    hdd    1.81897          osd.12        up   1.00000  1.00000
      19    hdd    1.81897          osd.19        up   1.00000  1.00000
      28    hdd    1.81897          osd.28        up   1.00000  1.00000
      35    hdd    1.81897          osd.35        up   1.00000  1.00000
      44    hdd    1.81897          osd.44        up   1.00000  1.00000
      52    hdd    1.81897          osd.52        up   1.00000  1.00000
      58    hdd    1.81897          osd.58        up   1.00000  1.00000
      63    hdd    1.81897          osd.63        up   1.00000  1.00000
      67    hdd    1.81897          osd.67        up   1.00000  1.00000
      71    hdd    1.81897          osd.71        up   1.00000  1.00000
      73    hdd    1.81897          osd.73        up   1.00000  1.00000
      74    hdd    1.81897          osd.74        up   1.00000  1.00000
      75    hdd    1.81897          osd.75        up   1.00000  1.00000
      77    hdd    1.81897          osd.77        up   1.00000  1.00000
      78    hdd    1.81897          osd.78        up   1.00000  1.00000
      -5          14.55518      host rokanan
       1    hdd    1.81940          osd.1         up   1.00000  1.00000
      10    hdd    1.81940          osd.10        up   1.00000  1.00000
      17    hdd    1.81940          osd.17        up   1.00000  1.00000
      24    hdd    1.81940          osd.24        up   1.00000  1.00000
      31    hdd    1.81940          osd.31        up   1.00000  1.00000
      38    hdd    1.81940          osd.38        up   1.00000  1.00000
      46    hdd    1.81940          osd.46        up   1.00000  1.00000
      53    hdd    1.81940          osd.53        up   1.00000  1.00000
    -11           7.27515      host romolo
       5    hdd    0.45470          osd.5         up   1.00000  1.00000
       9    hdd    0.45470          osd.9         up   0.75000  1.00000
      13    hdd    0.45470          osd.13        up   1.00000  1.00000
      20    hdd    0.45470          osd.20        up   0.95000  1.00000
      25    hdd    0.45470          osd.25        up   0.75000  1.00000
      30    hdd    0.45470          osd.30        up   1.00000  1.00000
      36    hdd    0.45470          osd.36        up   1.00000  1.00000
      42    hdd    0.45470          osd.42        up   1.00000  1.00000
      45    hdd    0.45470          osd.45        up   0.85004  1.00000
      51    hdd    0.45470          osd.51        up   0.89999  1.00000
      56    hdd    0.45470          osd.56        up   1.00000  1.00000
      60    hdd    0.45470          osd.60        up   1.00000  1.00000
      62    hdd    0.45470          osd.62        up   1.00000  1.00000
      65    hdd    0.45470          osd.65        up   0.85004  1.00000
      68    hdd    0.45470          osd.68        up   1.00000  1.00000
      70    hdd    0.45470          osd.70        up   1.00000  1.00000
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>

--
Nicola Mori, Ph.D.
INFN sezione di Firenze
Via Bruno Rossi 1, 50019 Sesto F.no (Italy)
+390554572660
mori@xxxxxxxxxx
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx