Re: Increasing number of unscrubbed PGs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm still not sure why increasing the interval doesn't help (maybe there's some flag set to the PG or something), but you could just increase osd_max_scrubs if your OSDs are not too busy. On one customer cluster with high load during the day we configured the scrubs to run during the night but then with osd_max_scrubs = 6. What is your current value for osd_max_scrubs?

Regards,
Eugen

Zitat von Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>:

Hi,


our cluster is running pacific 16.2.10. Since the upgrade the clusters starts to report an increasing number of PG without a timely deep-scrub:


# ceph -s
  cluster:
    id:    XXXX
    health: HEALTH_WARN
            1073 pgs not deep-scrubbed in time

  services:
    mon: 3 daemons, quorum XXX,XXX,XXX (age 10d)
    mgr: XXX(active, since 3w), standbys: XXX, XXX
    mds: 2/2 daemons up, 2 standby
    osd: 460 osds: 459 up (since 3d), 459 in (since 5d)
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 2/2 healthy
    pools:   16 pools, 5073 pgs
    objects: 733.76M objects, 1.1 PiB
    usage:   1.6 PiB used, 3.3 PiB / 4.9 PiB avail
    pgs:     4941 active+clean
             105  active+clean+scrubbing
             27   active+clean+scrubbing+deep


The cluster is healthy otherwise, with the exception of one failed OSD. It has been marked out and should not interfere with scrubbing. Scrubbing itself is running, but there are too few deep-scrubs. If I remember correctly we had a larger number of deep scrubs before the last upgrade. It tried to extend the deep-scrub interval, but to no avail yet.

The majority of PGs is part of a ceph data pool (4096 of 4941 pgs), and those are also most of the pgs reported. The pool is backed by 12 machines with 48 disks each, so there should be enough I/O capacity for running deep-scrubs. Load on these machines and disks is also pretty low.

Any hints on debugging this? The number of affected PGs has rising from 600 to over 1000 during the weekend and continues to rise...


Best regards,

Burkhard Linke


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux