Re: scrubing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

there's some ratio involved when deep-scrubs are checked:

(mon_warn_pg_not_deep_scrubbed_ratio * deep_scrub_interval) + deep_scrub_interval

So based on the defaults, ceph would only warn if the last deep-scrub timestamp is older than:

(0.75 * 7 days) + 7 days = 12.25 days

Note that the MGR has also a config for deep_scrub_interval. Check out the docs [0] or my recent blog [1] on that topic.

  Why ceph tell me «1» pg not been scrub when I see 15 ?

See my reply above.

  Is they are any way to find which pg ceph status are talking about.

'ceph health detail' will show you which PG it's warning about.

  Is they are any way to see the progress or scrubbing/remapping/backfill ?

You can see when (deep-)scrubs have been started in the OSD logs or depending on your cluster log configuration:

ceph log last 1000 debug cluster | grep scrub

The (deep-)scrub duration depends on the PG sizes, so they can vary. But from experience (and older Logs) you can see if scrubbing duration has increased. I haven't checked if there's a metric for that in prometheus. As for remapping and backfill operations, they are constantly reported in 'ceph status', it shows how many objects are degraded, how many PGs are remapped etc. If you mean something else, please clarify.

Regards,
Eugen

[0] https://docs.ceph.com/en/latest/rados/operations/health-checks/#pg-not-deep-scrubbed [1] https://heiterbiswolkig.blogs.nde.ag/2024/09/06/pgs-not-deep-scrubbed-in-time/

Zitat von Albert Shih <Albert.Shih@xxxxxxxx>:

Hi everyone.

Few time ago I add a new node to my cluster with some HDD.

Currently the cluster does the remapping and backfill.

I now got a warning about

HEALTH_WARN 1 pgs not deep-scrubbed in time

So I check and find something a litle weird.

root@cthulhu1:~# ceph config get osd osd_deep_scrub_interval
604800.000000

so that's one week.

If I check the LAST DEEP SCRUB TIMESTAMP I got

root@cthulhu1:~# ceph pg dump pgs | awk '{print $1" "$24}' | grep -v 2024-09-[1-2][0-9]
dumped pgs
PG_STAT DEEP_SCRUB_STAMP
4.63 2024-09-09T19:00:57.739975+0000
4.5a 2024-09-09T08:17:15.124704+0000
4.56 2024-09-09T21:51:07.478651+0000
4.51 2024-09-08T00:10:30.552347+0000
4.4c 2024-09-09T10:35:02.048445+0000
4.4b 2024-09-09T19:53:19.839341+0000
4.14 2024-09-08T18:36:12.025455+0000
4.c 2024-09-09T16:00:59.047968+0000
4.4 2024-09-09T00:19:07.554153+0000
4.8 2024-09-09T22:19:15.280310+0000
4.25 2024-09-09T06:45:37.258306+0000
4.30 2024-09-09T16:56:21.472410+0000
4.82 2024-09-09T21:14:09.802303+0000
4.c9 2024-09-08T17:10:56.133363+0000
4.f7 2024-09-09T08:25:40.011924+0000

If I check the status of those PG it's or

  active+clean+scrubbing+deep and deep scrubbing for Xs

or

  queued for deep scrub

So my questions are :

  Why ceph tell me «1» pg not been scrub when I see 15 ?

  Is they are any way to find which pg ceph status are talking about.

  Is they are any way to see the progress or scrubbing/remapping/backfill ?

Regards


--
Albert SHIH 🦫 🐸
Observatoire de Paris
France
Heure locale/Local time:
ven. 20 sept. 2024 09:35:43 CEST
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux