Re: Eccessive occupation of small OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you Nathan for your insight. Actually I don't know if a single PG occupies a large fraction of the OSD or not, I'll search for how to check this. Anyway, on the culprit OSD I effectively have a large amount of PGs respect to the size, and also in other 500 GB OSDs I see a similar pattern:

OSD   PGs  Occupancy
 9    21     61%
51    20     59%
36    18     53%
45    18     53%
70    18     53%
25    17     50%

This looks to me more like "too many PGs" rather than "a single too big PG". For comparison, the 2TB OSD with highest occupancy (41%) has 57 OSDs, less than 3 times OSD 9 despite being 4 times bigger.

I'd then say it's a matter PG count fluctuations due to the small total PG number (512 for 104 OSDs), for which I fear I cannot do much given the very low memory specs of my machines. Does this sound reasonable to you?

Nicola



On 30/03/23 16:42, Nathan Fish wrote:
When a single pg is a substantial percentage of an OSD (eg 10%) it's
hard for the upmap balancer to do much. It's possible you'd have just
as much space, or more, if you removed the 500GB HDDs. Another option
might be to mdraid-0 the 500GB's in pairs, and make OSDs from the
pairs; this would result in 1TB OSDs that might work better. Any
cluster on this hardware is going to be duct tape and string, though.

On Thu, Mar 30, 2023 at 10:35 AM Nicola Mori <mori@xxxxxxxxxx> wrote:

Dear Ceph users,

my cluster is made up of 10 old machines, with uneven number of disks and disk size. Essentially I have just one big data pool (6+2 erasure code, with host failure domain) for which I am currently experiencing a very poor available space (88 TB of which 40 TB occupied, as reported by df -h on hosts mounting the cephfs) compared to the raw one (196.5 TB). I have a total of 104 OSDs and 512 PGs for the pool; I cannot increment the PG number since the machines are old and with very low amount of RAM, and some of them are already overloaded.

In this situation I'm seeing a high occupation of small OSDs (500 MB) with respect to bigger ones (2 and 4 TB) even if the weight is set equal to disk capacity (see below for ceph osd tree). For example OSD 9 is at 62% occupancy even with weight 0.5 and reweight 0.75, while the highest occupancy for 2 TB OSDs is 41% (OSD 18) and 4 TB OSDs is 23% (OSD 79). I guess this high occupancy for 500 MB OSDs combined with erasure code size and host failure domain might be the cause of the poor available space, could this be true? The upmap balancer is currently running but I don't know if and how much it could improve the situation.
Any hint is greatly appreciated, thanks.

Nicola

# ceph osd tree
ID   CLASS  WEIGHT     TYPE NAME         STATUS  REWEIGHT  PRI-AFF
  -1         196.47754  root default
  -7          14.55518      host aka
   4    hdd    1.81940          osd.4         up   1.00000  1.00000
  11    hdd    1.81940          osd.11        up   1.00000  1.00000
  18    hdd    1.81940          osd.18        up   1.00000  1.00000
  26    hdd    1.81940          osd.26        up   1.00000  1.00000
  32    hdd    1.81940          osd.32        up   1.00000  1.00000
  41    hdd    1.81940          osd.41        up   1.00000  1.00000
  48    hdd    1.81940          osd.48        up   1.00000  1.00000
  55    hdd    1.81940          osd.55        up   1.00000  1.00000
  -3          14.55518      host balin
   0    hdd    1.81940          osd.0         up   1.00000  1.00000
   8    hdd    1.81940          osd.8         up   1.00000  1.00000
  15    hdd    1.81940          osd.15        up   1.00000  1.00000
  22    hdd    1.81940          osd.22        up   1.00000  1.00000
  29    hdd    1.81940          osd.29        up   1.00000  1.00000
  34    hdd    1.81940          osd.34        up   1.00000  1.00000
  43    hdd    1.81940          osd.43        up   1.00000  1.00000
  49    hdd    1.81940          osd.49        up   1.00000  1.00000
-13          29.10950      host bifur
   3    hdd    3.63869          osd.3         up   1.00000  1.00000
  14    hdd    3.63869          osd.14        up   1.00000  1.00000
  27    hdd    3.63869          osd.27        up   1.00000  1.00000
  37    hdd    3.63869          osd.37        up   1.00000  1.00000
  50    hdd    3.63869          osd.50        up   1.00000  1.00000
  59    hdd    3.63869          osd.59        up   1.00000  1.00000
  64    hdd    3.63869          osd.64        up   1.00000  1.00000
  69    hdd    3.63869          osd.69        up   1.00000  1.00000
-17          29.10950      host bofur
   2    hdd    3.63869          osd.2         up   1.00000  1.00000
  21    hdd    3.63869          osd.21        up   1.00000  1.00000
  39    hdd    3.63869          osd.39        up   1.00000  1.00000
  57    hdd    3.63869          osd.57        up   1.00000  1.00000
  66    hdd    3.63869          osd.66        up   1.00000  1.00000
  72    hdd    3.63869          osd.72        up   1.00000  1.00000
  76    hdd    3.63869          osd.76        up   1.00000  1.00000
  79    hdd    3.63869          osd.79        up   1.00000  1.00000
-21          29.10376      host dwalin
  88    hdd    1.81898          osd.88        up   1.00000  1.00000
  89    hdd    1.81898          osd.89        up   1.00000  1.00000
  90    hdd    1.81898          osd.90        up   1.00000  1.00000
  91    hdd    1.81898          osd.91        up   1.00000  1.00000
  92    hdd    1.81898          osd.92        up   1.00000  1.00000
  93    hdd    1.81898          osd.93        up   1.00000  1.00000
  94    hdd    1.81898          osd.94        up   1.00000  1.00000
  95    hdd    1.81898          osd.95        up   1.00000  1.00000
  96    hdd    1.81898          osd.96        up   1.00000  1.00000
  97    hdd    1.81898          osd.97        up   1.00000  1.00000
  98    hdd    1.81898          osd.98        up   1.00000  1.00000
  99    hdd    1.81898          osd.99        up   1.00000  1.00000
100    hdd    1.81898          osd.100       up   1.00000  1.00000
101    hdd    1.81898          osd.101       up   1.00000  1.00000
102    hdd    1.81898          osd.102       up   1.00000  1.00000
103    hdd    1.81898          osd.103       up   1.00000  1.00000
  -9          14.55518      host ogion
   7    hdd    1.81940          osd.7         up   1.00000  1.00000
  16    hdd    1.81940          osd.16        up   1.00000  1.00000
  23    hdd    1.81940          osd.23        up   1.00000  1.00000
  33    hdd    1.81940          osd.33        up   1.00000  1.00000
  40    hdd    1.81940          osd.40        up   1.00000  1.00000
  47    hdd    1.81940          osd.47        up   1.00000  1.00000
  54    hdd    1.81940          osd.54        up   1.00000  1.00000
  61    hdd    1.81940          osd.61        up   1.00000  1.00000
-19          14.55518      host prestno
  81    hdd    1.81940          osd.81        up   1.00000  1.00000
  82    hdd    1.81940          osd.82        up   1.00000  1.00000
  83    hdd    1.81940          osd.83        up   1.00000  1.00000
  84    hdd    1.81940          osd.84        up   1.00000  1.00000
  85    hdd    1.81940          osd.85        up   1.00000  1.00000
  86    hdd    1.81940          osd.86        up   1.00000  1.00000
  87    hdd    1.81940          osd.87        up   1.00000  1.00000
104    hdd    1.81940          osd.104       up   1.00000  1.00000
-15          29.10376      host remolo
   6    hdd    1.81897          osd.6         up   1.00000  1.00000
  12    hdd    1.81897          osd.12        up   1.00000  1.00000
  19    hdd    1.81897          osd.19        up   1.00000  1.00000
  28    hdd    1.81897          osd.28        up   1.00000  1.00000
  35    hdd    1.81897          osd.35        up   1.00000  1.00000
  44    hdd    1.81897          osd.44        up   1.00000  1.00000
  52    hdd    1.81897          osd.52        up   1.00000  1.00000
  58    hdd    1.81897          osd.58        up   1.00000  1.00000
  63    hdd    1.81897          osd.63        up   1.00000  1.00000
  67    hdd    1.81897          osd.67        up   1.00000  1.00000
  71    hdd    1.81897          osd.71        up   1.00000  1.00000
  73    hdd    1.81897          osd.73        up   1.00000  1.00000
  74    hdd    1.81897          osd.74        up   1.00000  1.00000
  75    hdd    1.81897          osd.75        up   1.00000  1.00000
  77    hdd    1.81897          osd.77        up   1.00000  1.00000
  78    hdd    1.81897          osd.78        up   1.00000  1.00000
  -5          14.55518      host rokanan
   1    hdd    1.81940          osd.1         up   1.00000  1.00000
  10    hdd    1.81940          osd.10        up   1.00000  1.00000
  17    hdd    1.81940          osd.17        up   1.00000  1.00000
  24    hdd    1.81940          osd.24        up   1.00000  1.00000
  31    hdd    1.81940          osd.31        up   1.00000  1.00000
  38    hdd    1.81940          osd.38        up   1.00000  1.00000
  46    hdd    1.81940          osd.46        up   1.00000  1.00000
  53    hdd    1.81940          osd.53        up   1.00000  1.00000
-11           7.27515      host romolo
   5    hdd    0.45470          osd.5         up   1.00000  1.00000
   9    hdd    0.45470          osd.9         up   0.75000  1.00000
  13    hdd    0.45470          osd.13        up   1.00000  1.00000
  20    hdd    0.45470          osd.20        up   0.95000  1.00000
  25    hdd    0.45470          osd.25        up   0.75000  1.00000
  30    hdd    0.45470          osd.30        up   1.00000  1.00000
  36    hdd    0.45470          osd.36        up   1.00000  1.00000
  42    hdd    0.45470          osd.42        up   1.00000  1.00000
  45    hdd    0.45470          osd.45        up   0.85004  1.00000
  51    hdd    0.45470          osd.51        up   0.89999  1.00000
  56    hdd    0.45470          osd.56        up   1.00000  1.00000
  60    hdd    0.45470          osd.60        up   1.00000  1.00000
  62    hdd    0.45470          osd.62        up   1.00000  1.00000
  65    hdd    0.45470          osd.65        up   0.85004  1.00000
  68    hdd    0.45470          osd.68        up   1.00000  1.00000
  70    hdd    0.45470          osd.70        up   1.00000  1.00000
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Nicola Mori, Ph.D.
INFN sezione di Firenze
Via Bruno Rossi 1, 50019 Sesto F.no (Italy)
+390554572660
mori@xxxxxxxxxx

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux