Re: Problems with long taking deep-scrubbing processes causing PG_NOT_DEEP_SCRUBBED

Carsten Grommel - Profihost AG <c.grommel@xxxxxxxxxxxx> · Mon, 3 Aug 2020 10:45:13 +0200

One way this can happen is if you have the default setting

	osd_scrub_during_recovery=false
Seems like the default setting is active

If you’ve been doing a lot of [re]balancing, drive replacements, topology changes, expansions, etc. scrubs can be starved especially if you’re doing EC on HDDs.

HDD or SSD OSDs?  Replication or EC?

HDDs with SSDs as cache, EC

Number of OSDs? Number of PGs? Values of osd_scrub_max interval and osd_deep_scrub_interval ?

*ceph -s*
  cluster:
    id:     7e242332-55c3-4926-9646-149b2f5c8081
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum cloud10-1517,cloud10-1518,cloud10-1519 (age 12d)
    mgr: cloud10-1519(active, since 13d), standbys: cloud10-1518, 
cloud10-1517
    mds:  3 up:standby
*    osd: 56 osds: 56 up (since 12d), 56 in (since 4w)*

  data:
*    pools:   2 pools, 1280 pgs*
    objects: 72.32M objects, 215 TiB
    usage:   346 TiB used, 187 TiB / 533 TiB avail
    pgs:     918 active+clean+snaptrim_wait
             267 active+clean
             94  active+clean+snaptrim
             1   active+clean+scrubbing+deep

  io:
    client:   295 KiB/s rd, 199 MiB/s wr, 221 op/s rd, 894 op/s wr

The interval settings are already very high:

osd_scrub_max_interval = 4838400
osd_deep_scrub_interval = 3628800

Am 31.07.20 um 20:30 schrieb Anthony D'Atri:
One way this can happen is if you have the default setting

	osd_scrub_during_recovery=false

If you’ve been doing a lot of [re]balancing, drive replacements, topology changes, expansions, etc. scrubs can be starved especially if you’re doing EC on HDDs.

HDD or SSD OSDs?  Replication or EC?

Number of OSDs? Number of PGs? Values of osd_scrub_max interval and osd_deep_scrub_interval ?

— aad

On Jul 31, 2020, at 10:52 AM, ceph@xxxxxxxxxx wrote:

What happen when you do start a scrub manual?

Imo

ceph osd deep-scrub xyz

Hth
Mehmet

Am 31. Juli 2020 15:35:49 MESZ schrieb Carsten Grommel - Profihost AG <c.grommel@xxxxxxxxxxxx>:
Hi,

we are having problems with really long taking deep-scrubb processes
causing PG_NOT_DEEP_SCRUBBED and ceph HEALTH_WARN. One ph is waiting
since 2020-05-18 for the deep-scrubb.

Is there any way to speed up the deep-scrubbing?

Ceph-Version:

ceph version 14.2.8-3-gc6b8eedb77
(c6b8eedb771089fe3b0a95da93158ec4144758f3) nautilus (stable)
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx