Forgot to do a reply all. What does ceph osd df ceph osd dump | grep pool return? Are you using auto scaling? 289pg with 272tb of data and 60 osds, that seems like 3-4 pg per osd at almost 1TB each. Unless I'm thinking of this wrong. On Thu, Mar 2, 2023, 17:37 Joffrey <joff.au@xxxxxxxxx> wrote: > My Ceph Version is 17.2.5 and all configuration about osd_scrub* are > defaults. I tried some updates on osd-max-backfills but no change. > I have many HDD with NVME for db and all are connected in a 25G network. > > Yes, it's the same PG since 4 days. > > I got a failure on a HDD and get many days of recovery+backfilling last 2 > weeks. Perhaps the 'not in time' warning is related to this. > > 'Jof > > Le jeu. 2 mars 2023 à 14:25, Anthony D'Atri <aad@xxxxxxxxxxxxxx> a écrit : > > > Run `ceph health detail`. > > > > Is it the same PG backfilling for a long time, or a different one over > > time? > > > > That it’s remapped makes me think that what you’re seeing is the balancer > > doing its job. > > > > As far as the scrubbing, do you limit the times when scrubbing can > happen? > > Are these HDDs? EC? > > > > > On Mar 2, 2023, at 07:20, Joffrey <joff.au@xxxxxxxxx> wrote: > > > > > > Hi, > > > > > > I have many 'not {deep-}scrubbed in time' and a1 PG > remapped+backfilling > > > and I don't understand why this backfilling is taking so long. > > > > > > root@hbgt-ceph1-mon3:/# ceph -s > > > cluster: > > > id: c300532c-51fa-11ec-9a41-0050569c3b55 > > > health: HEALTH_WARN > > > 15 pgs not deep-scrubbed in time > > > 13 pgs not scrubbed in time > > > > > > services: > > > mon: 3 daemons, quorum > hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3 > > > (age 36h) > > > mgr: hbgt-ceph1-mon2.nteihj(active, since 2d), standbys: > > > hbgt-ceph1-mon1.thrnnu, hbgt-ceph1-mon3.gmfzqm > > > osd: 60 osds: 60 up (since 13h), 60 in (since 13h); 1 remapped pgs > > > rgw: 3 daemons active (3 hosts, 2 zones) > > > > > > data: > > > pools: 13 pools, 289 pgs > > > objects: 67.74M objects, 127 TiB > > > usage: 272 TiB used, 769 TiB / 1.0 PiB avail > > > pgs: 288 active+clean > > > 1 active+remapped+backfilling > > > > > > io: > > > client: 3.3 KiB/s rd, 1.5 MiB/s wr, 3 op/s rd, 8 op/s wr > > > recovery: 790 KiB/s, 0 objects/s > > > > > > > > > What can I do to understand this slow recovery (is it the backfill > > action ?) > > > > > > Thanks you > > > > > > 'Jof > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx