Re: scrubing

Eugen Block <eblock@xxxxxx> · Fri, 20 Sep 2024 08:00:16 +0000

Hi,

there's some ratio involved when deep-scrubs are checked:

(mon_warn_pg_not_deep_scrubbed_ratio * deep_scrub_interval) +  
deep_scrub_interval

So based on the defaults, ceph would only warn if the last deep-scrub  
timestamp is older than:

(0.75 * 7 days) + 7 days = 12.25 days

Note that the MGR has also a config for deep_scrub_interval. Check out  
the docs [0] or my recent blog [1] on that topic.

  Why ceph tell me «1» pg not been scrub when I see 15 ?

See my reply above.

  Is they are any way to find which pg ceph status are talking about.

'ceph health detail' will show you which PG it's warning about.

  Is they are any way to see the progress or scrubbing/remapping/backfill ?

You can see when (deep-)scrubs have been started in the OSD logs or  
depending on your cluster log configuration:

ceph log last 1000 debug cluster | grep scrub

The (deep-)scrub duration depends on the PG sizes, so they can vary.  
But from experience (and older Logs) you can see if scrubbing duration  
has increased. I haven't checked if there's a metric for that in  
prometheus.
As for remapping and backfill operations, they are constantly reported  
in 'ceph status', it shows how many objects are degraded, how many PGs  
are remapped etc. If you mean something else, please clarify.

Regards,
Eugen

[0]  
https://docs.ceph.com/en/latest/rados/operations/health-checks/#pg-not-deep-scrubbed
[1]  
https://heiterbiswolkig.blogs.nde.ag/2024/09/06/pgs-not-deep-scrubbed-in-time/

Zitat von Albert Shih <Albert.Shih@xxxxxxxx>:

Hi everyone.

Few time ago I add a new node to my cluster with some HDD.

Currently the cluster does the remapping and backfill.

I now got a warning about

HEALTH_WARN 1 pgs not deep-scrubbed in time

So I check and find something a litle weird.

root@cthulhu1:~# ceph config get osd osd_deep_scrub_interval
604800.000000

so that's one week.

If I check the LAST DEEP SCRUB TIMESTAMP I got

root@cthulhu1:~# ceph pg dump pgs | awk '{print $1" "$24}' | grep -v  
2024-09-[1-2][0-9]
dumped pgs
PG_STAT DEEP_SCRUB_STAMP
4.63 2024-09-09T19:00:57.739975+0000
4.5a 2024-09-09T08:17:15.124704+0000
4.56 2024-09-09T21:51:07.478651+0000
4.51 2024-09-08T00:10:30.552347+0000
4.4c 2024-09-09T10:35:02.048445+0000
4.4b 2024-09-09T19:53:19.839341+0000
4.14 2024-09-08T18:36:12.025455+0000
4.c 2024-09-09T16:00:59.047968+0000
4.4 2024-09-09T00:19:07.554153+0000
4.8 2024-09-09T22:19:15.280310+0000
4.25 2024-09-09T06:45:37.258306+0000
4.30 2024-09-09T16:56:21.472410+0000
4.82 2024-09-09T21:14:09.802303+0000
4.c9 2024-09-08T17:10:56.133363+0000
4.f7 2024-09-09T08:25:40.011924+0000

If I check the status of those PG it's or

  active+clean+scrubbing+deep and deep scrubbing for Xs

or

  queued for deep scrub

So my questions are :

  Why ceph tell me «1» pg not been scrub when I see 15 ?

  Is they are any way to find which pg ceph status are talking about.

  Is they are any way to see the progress or scrubbing/remapping/backfill ?

Regards

--
Albert SHIH 🦫 🐸
Observatoire de Paris
France
Heure locale/Local time:
ven. 20 sept. 2024 09:35:43 CEST
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx