On Feb 26, 2025, at 7:47 AM, Deep Dish <deeepdish@xxxxxxxxx> wrote: Your parents had quite the sense of humor. > Hello, > > I have an 80 OSD cluster (across 8 nodes). The average utilization across my OSDs is ~ 32%. Average isn’t what factors in here ... > Recently the cluster had a bad drive, and it was replaced (same capacity). 1TB HDDs? How old is this gear? Oh, looks like your CRUSH weights don’t align with OSD TBs. Tricky. I suspect your drives are …. 8TB? > So the one thing that sticks out straight away is OSD.75 and it having a different weight to all the other devices. That sure doesn’t help. I suspect that for some reason the CRUSH weights of all OSDs in the cluster were set to 1.0000 in the past. Which in and of itself is … okay, as operationally CRUSH weights are *relative* to each other. The replaced drive wasn’t brought up with that custom CRUSH weight, so it has the default TiB CRUSH weight. As Frédéric suggests, do this NOW: ceph osd crush reweight osd.75 1.0000 This will back off your immediate problem. >ceph osd reweight 75 1 Without `crush` in there this would actually be a no-op ;) You could set osd_crush_initial_weight = 1.0 to force all new OSDs to have that 1.000 CRUSH weight, but that would bite you if you do legitimately add larger drives down the road. I suggest reweighting all of your drives to 7.15359 at the same time by decompiling and editing the CRUSH map to avoid future problems. Look at `dmesg` / `/var/log/messages` on each host, `smartctl -a` for each drive > For the past week or so the cluster has been > recovering, slowly, Look at `dmesg` / `/var/log/messages` on each host, `smartctl -a` for each drive, and `storcli64 /c0 show termlog`. See if there are any indications of one or more bad drives: lots of reallocated sectors, SATA downshifts, etc. > and reporting backfill_toofull. I can't figure out what's causing the issue given there's ample available capacity. Capacity and available capacity are different. Are you using EC? As wide as 8+2? > usage: 197 TiB used, 413 TiB / 610 TiB avail > recovery: 16 MiB/s, 4 objects/s Small clusters recover more slowly, but that’s pretty slow for an 80 OSD cluster. Is this Reef or Squid with mclock? > > > # ceph osd df > > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META > AVAIL %USE VAR PGS STATUS Please set your MUA to not wrap > > 1 hdd 1.00000 1.00000 9.1 TiB 2.2 TiB 2.2 TiB 720 KiB 5.8 GiB 6.9 TiB 24.28 0.75 108 up > > 9 hdd 1.00000 1.00000 7.3 TiB 2.7 TiB 2.7 TiB 20 MiB 8.8 GiB 4.6 TiB 36.76 1.14 103 up > > 16 hdd 1.00000 1.00000 7.3 TiB 2.2 TiB 2.2 TiB 63 KiB 6.1 GiB 5.1 TiB 29.82 0.92 109 up > > 27 hdd 1.00000 1.00000 9.1 TiB 2.4 TiB 2.4 TiB 1.9 MiB 6.5 GiB 6.7 TiB 26.23 0.81 108 up > 75 hdd 7.15359 1.00000 7.2 TiB 4.5 TiB 4.5 TiB 158 MiB 13 GiB 2.6 TiB 63.47 1.96 356 up > ... > TiB 32.01 0.99 105 up > > TOTAL 610 TiB 197 TiB 196 TiB 1.7 GiB 651 GiB 413 > TiB 32.31 > > MIN/MAX VAR: 0.67/1.96 STDDEV: 5.72 You don’t have a balancer enabled, or it isn’t working. Your available space is a function not only of the *full ratios but of your replication strategies and is relative to the *most full* OSD. Send `ceph osd crush rule dump` and `ceph balancer status` and `ceph -v` > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx