Hi,
there's some ratio involved when deep-scrubs are checked:
(mon_warn_pg_not_deep_scrubbed_ratio * deep_scrub_interval) +
deep_scrub_interval
So based on the defaults, ceph would only warn if the last deep-scrub
timestamp is older than:
(0.75 * 7 days) + 7 days = 12.25 days
Note that the MGR has also a config for deep_scrub_interval. Check out
the docs [0] or my recent blog [1] on that topic.
Why ceph tell me «1» pg not been scrub when I see 15 ?
See my reply above.
Is they are any way to find which pg ceph status are talking about.
'ceph health detail' will show you which PG it's warning about.
Is they are any way to see the progress or scrubbing/remapping/backfill ?
You can see when (deep-)scrubs have been started in the OSD logs or
depending on your cluster log configuration:
ceph log last 1000 debug cluster | grep scrub
The (deep-)scrub duration depends on the PG sizes, so they can vary.
But from experience (and older Logs) you can see if scrubbing duration
has increased. I haven't checked if there's a metric for that in
prometheus.
As for remapping and backfill operations, they are constantly reported
in 'ceph status', it shows how many objects are degraded, how many PGs
are remapped etc. If you mean something else, please clarify.
Regards,
Eugen
[0]
https://docs.ceph.com/en/latest/rados/operations/health-checks/#pg-not-deep-scrubbed
[1]
https://heiterbiswolkig.blogs.nde.ag/2024/09/06/pgs-not-deep-scrubbed-in-time/
Zitat von Albert Shih <Albert.Shih@xxxxxxxx>:
Hi everyone.
Few time ago I add a new node to my cluster with some HDD.
Currently the cluster does the remapping and backfill.
I now got a warning about
HEALTH_WARN 1 pgs not deep-scrubbed in time
So I check and find something a litle weird.
root@cthulhu1:~# ceph config get osd osd_deep_scrub_interval
604800.000000
so that's one week.
If I check the LAST DEEP SCRUB TIMESTAMP I got
root@cthulhu1:~# ceph pg dump pgs | awk '{print $1" "$24}' | grep -v
2024-09-[1-2][0-9]
dumped pgs
PG_STAT DEEP_SCRUB_STAMP
4.63 2024-09-09T19:00:57.739975+0000
4.5a 2024-09-09T08:17:15.124704+0000
4.56 2024-09-09T21:51:07.478651+0000
4.51 2024-09-08T00:10:30.552347+0000
4.4c 2024-09-09T10:35:02.048445+0000
4.4b 2024-09-09T19:53:19.839341+0000
4.14 2024-09-08T18:36:12.025455+0000
4.c 2024-09-09T16:00:59.047968+0000
4.4 2024-09-09T00:19:07.554153+0000
4.8 2024-09-09T22:19:15.280310+0000
4.25 2024-09-09T06:45:37.258306+0000
4.30 2024-09-09T16:56:21.472410+0000
4.82 2024-09-09T21:14:09.802303+0000
4.c9 2024-09-08T17:10:56.133363+0000
4.f7 2024-09-09T08:25:40.011924+0000
If I check the status of those PG it's or
active+clean+scrubbing+deep and deep scrubbing for Xs
or
queued for deep scrub
So my questions are :
Why ceph tell me «1» pg not been scrub when I see 15 ?
Is they are any way to find which pg ceph status are talking about.
Is they are any way to see the progress or scrubbing/remapping/backfill ?
Regards
--
Albert SHIH 🦫 🐸
Observatoire de Paris
France
Heure locale/Local time:
ven. 20 sept. 2024 09:35:43 CEST
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx