Still seing scrub errors in .80.5

clewis@xxxxxxxxxxxxxxxxxx (Craig Lewis) · Fri, 19 Sep 2014 18:04:25 -0700

On Thu, Sep 18, 2014 at 3:09 AM, Marc <mail at shoowin.de> wrote:

>
> for pgnum in `ceph pg dump|grep active|awk '{print $1}'`; do ceph pg
> deep-scrub $pgnum; done
>
>
That probably didn't deep-scrub everything.  ceph pg deep-scrub won't run
more than osd max scrubs per OSD, which defaults to 1.  Sometimes going
over that will queue the next scrub, and sometimes it just gets ignored.
 My guess is that it would take me at least 5 days to deep-scrub my whole
cluster, if I'm manually scheduling it for optimal parallelism.  Mostly
because my disks are 4TB, and 75% full.

Ceph isn't really good about scheduling the deep-scrub.  If the next oldest
deep-scrub shares an OSD with a current deep-scrub, it just doesn't start
the next until that current one finishes.  It doesn't go out of order.  If
I let ceph manage it's deep-scrubs in the background, my oldest deep-scrub
is 12 days old.

I check my oldest deep-scrubs with:
ceph pg dump | egrep '^[0-9a-f]+\.[0-9a-f]+' | awk '{print $20, $21, $1,
$14;}' | sort -nr | tail -40

Columns 20 and 21 are deep-scrub date and time, 1 is the pgid, and 14 is
the acting OSDs.

Make sure everything there has been deep-scrubbed more recently than your
firefly upgrade.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140919/be2ecf3b/attachment.htm>