Re: "ceph pg scrub" does not start

Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> · Thu, 21 Jun 2018 14:04:29 +0100

On 21/06/18 10:14, Wido den Hollander wrote:

Hi Wido,

>> Note the date stamps, the scrub command appears to be ignored
>>
>> Any ideas on why this is happening, and what we can do to fix the error?
> 
> Are any of the OSDs involved with that PG currently doing recovery? If
> so, they will ignore a scrub until the recovery has finished.
> 
> Or set osd_scrub_during_recovery=true
> 
> Wido

your correct - osd_scrub_during_recovery=false

# ceph config get osd.333 osd_scrub_during_recovery
false

# ceph config set osd.333 osd_scrub_during_recovery true

# ceph config get osd.333 osd_scrub_during_recovery
true

Looking at the osd log, the osd immediately started scrubbing a pg after
I set osd_scrub_during_recovery=true. However it started scrubbing a
different pg (not the one I asked it to scrub).

Perhaps I need to wait? So I waited, but the osd completed scrubbing the
other pg, and then went idle.

2018-06-21 11:37:55.075 7f39a3421700  0 log_channel(cluster) log [DBG] :
4.ce8 deep-scrub starts
2018-06-21 12:14:32.096 7f39a3421700  0 log_channel(cluster) log [DBG] :
4.ce8 deep-scrub ok

Digging deeper, there are some other anomalies that may give a clue:

checking a "known good" pg on this OSD
# rados list-inconsistent-obj 4.ce8
{"epoch":170900,"inconsistents":[]}

then trying to check the active+clean+inconsistent pg on this osd
# rados list-inconsistent-obj 4.1de
No scrub information available for pg 4.1de
error 2: (2) No such file or directory

Another potential anomali, I'm not sure if this is due to the mimic
upgrade, but this cluster doesn't respond to ceph --show-config commands
as I would expect:

Both of these nodes are in the same cluster...

[root@ceph6 ~]# ceph -n osd.333 --show-config
ceph>

[root@ceph1 ~]# # ceph -n osd.219  --show-config
[root@ceph1 ~]#

any thoughts and suggestions appreciated,

thanks again and best regards,

Jake

> 
>>
>> Some background:
>> Cluster upgraded from Luminous (12.2.5) to Mimic (13.2.0)
>> Pool uses EC 8+2, 10 nodes, 450 x 8TB Bluestore OSD
>>
>> Any ideas gratefully received..
>>
>> Jake
>>
-- 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com