On a cluster running Nautilus (14.2.8), we are getting a complaint about
a PG not being deep-scrubbed on time. Looking at the primary OSD's
logs, it looks like it tries to deep-scrub the PG every hour or so,
emits some complaints that I don't understand, but the deep scrub does
not finish (either with or without a scrub error).
Here is the PG from pg dump:
1.43f 31794 0 0 0 0
66930087214 0 0 3004 3004
active+clean+scrubbing+deep 2020-04-20 04:48:13.055481 46286'483734
46286:563439 [354,694,851] 354 [354,694,851] 354
39594'348643 2020-04-10 12:39:16.261088 38482'314644 2020-04-04
10:37:03.121638 0
Here is a section of the primary OSD logs (osd.354):
2020-04-18 08:21:08.322 7fffd2e2b700 0 log_channel(cluster) log [DBG] :
1.43f deep-scrub starts
2020-04-18 08:37:53.362 7fffd2e2b700 1 osd.354 pg_epoch: 45910
pg[1.43f( v 45909'449615 (45801'446525,45909'449615]
local-lis/les=45908/45909 n=30862 ec=874/871 lis/c 45908/45908 les/c/f
45909/45909/0 45910/45910/42988) [354,851] r=0 lpr=45910
pi=[45908,45910)/1 luod=0'0 crt=45909'449615 lcod 45909'449614 mlcod 0'0
active+scrubbing+deep mbc={}] start_peering_interval up [354,694,851] ->
[354,851], acting [354,694,851] -> [354,851], acting_primary 354 -> 354,
up_primary 354 -> 354, role 0 -> 0, features acting 4611087854031667199
upacting 4611087854031667199
2020-04-18 08:37:53.362 7fffd2e2b700 1 osd.354 pg_epoch: 45910
pg[1.43f( v 45909'449615 (45801'446525,45909'449615]
local-lis/les=45908/45909 n=30862 ec=874/871 lis/c 45908/45908 les/c/f
45909/45909/0 45910/45910/42988) [354,851] r=0 lpr=45910
pi=[45908,45910)/1 crt=45909'449615 lcod 45909'449614 mlcod 0'0 unknown
mbc={}] state<Start>: transitioning to Primary
2020-04-18 08:38:01.002 7fffd2e2b700 1 osd.354 pg_epoch: 45912
pg[1.43f( v 45909'449615 (45801'446525,45909'449615]
local-lis/les=45910/45911 n=30862 ec=874/871 lis/c 45910/45908 les/c/f
45911/45909/0 45912/45912/42988) [354,694,851] r=0 lpr=45912
pi=[45908,45912)/1 luod=0'0 crt=45909'449615 lcod 45909'449614 mlcod 0'0
active mbc={}] start_peering_interval up [354,851] -> [354,694,851],
acting [354,851] -> [354,694,851], acting_primary 354 -> 354, up_primary
354 -> 354, role 0 -> 0, features acting 4611087854031667199 upacting
4611087854031667199
2020-04-18 08:38:01.002 7fffd2e2b700 1 osd.354 pg_epoch: 45912
pg[1.43f( v 45909'449615 (45801'446525,45909'449615]
local-lis/les=45910/45911 n=30862 ec=874/871 lis/c 45910/45908 les/c/f
45911/45909/0 45912/45912/42988) [354,694,851] r=0 lpr=45912
pi=[45908,45912)/1 crt=45909'449615 lcod 45909'449614 mlcod 0'0 unknown
mbc={}] state<Start>: transitioning to Primary
2020-04-18 08:40:04.219 7fffd2e2b700 0 log_channel(cluster) log [DBG] :
1.43f deep-scrub starts
2020-04-18 08:56:49.095 7fffd2e2b700 1 osd.354 pg_epoch: 45914
pg[1.43f( v 45913'449735 (45812'446725,45913'449735]
local-lis/les=45912/45913 n=30868 ec=874/871 lis/c 45912/45912 les/c/f
45913/45913/0 45914/45914/42988) [354,851] r=0 lpr=45914
pi=[45912,45914)/1 luod=0'0 crt=45913'449735 lcod 45913'449734 mlcod 0'0
active+scrubbing+deep mbc={}] start_peering_interval up [354,694,851] ->
[354,851], acting [354,694,851] -> [354,851], acting_primary 354 -> 354,
up_primary 354 -> 354, role 0 -> 0, features acting 4611087854031667199
upacting 4611087854031667199
2020-04-18 08:56:49.095 7fffd2e2b700 1 osd.354 pg_epoch: 45914
pg[1.43f( v 45913'449735 (45812'446725,45913'449735]
local-lis/les=45912/45913 n=30868 ec=874/871 lis/c 45912/45912 les/c/f
45913/45913/0 45914/45914/42988) [354,851] r=0 lpr=45914
pi=[45912,45914)/1 crt=45913'449735 lcod 45913'449734 mlcod 0'0 unknown
mbc={}] state<Start>: transitioning to Primary
2020-04-18 08:56:55.627 7fffd2e2b700 1 osd.354 pg_epoch: 45916
pg[1.43f( v 45913'449735 (45812'446725,45913'449735]
local-lis/les=45914/45915 n=30868 ec=874/871 lis/c 45914/45912 les/c/f
45915/45913/0 45916/45916/42988) [354,694,851] r=0 lpr=45916
pi=[45912,45916)/1 luod=0'0 crt=45913'449735 lcod 45913'449734 mlcod 0'0
active mbc={}] start_peering_interval up [354,851] -> [354,694,851],
acting [354,851] -> [354,694,851], acting_primary 354 -> 354, up_primary
354 -> 354, role 0 -> 0, features acting 4611087854031667199 upacting
4611087854031667199
2020-04-18 08:56:55.627 7fffd2e2b700 1 osd.354 pg_epoch: 45916
pg[1.43f( v 45913'449735 (45812'446725,45913'449735]
local-lis/les=45914/45915 n=30868 ec=874/871 lis/c 45914/45912 les/c/f
45915/45913/0 45916/45916/42988) [354,694,851] r=0 lpr=45916
pi=[45912,45916)/1 crt=45913'449735 lcod 45913'449734 mlcod 0'0 unknown
mbc={}] state<Start>: transitioning to Primary
2020-04-18 08:56:56.867 7fffd2e2b700 0 log_channel(cluster) log [DBG] :
1.43f deep-scrub starts
2020-04-18 09:13:37.419 7fffd2e2b700 1 osd.354 pg_epoch: 45918
pg[1.43f( v 45917'449808 (45812'446725,45917'449808]
local-lis/les=45916/45917 n=30876 ec=874/871 lis/c 45916/45916 les/c/f
45917/45917/0 45918/45918/42988) [354,851] r=0 lpr=45918
pi=[45916,45918)/1 luod=0'0 crt=45917'449808 lcod 45917'449807 mlcod 0'0
active+scrubbing+deep mbc={}] start_peering_interval up [354,694,851] ->
[354,851], acting [354,694,851] -> [354,851], acting_primary 354 -> 354,
up_primary 354 -> 354, role 0 -> 0, features acting 4611087854031667199
upacting 4611087854031667199
2020-04-18 09:13:37.419 7fffd2e2b700 1 osd.354 pg_epoch: 45918
pg[1.43f( v 45917'449808 (45812'446725,45917'449808]
local-lis/les=45916/45917 n=30876 ec=874/871 lis/c 45916/45916 les/c/f
45917/45917/0 45918/45918/42988) [354,851] r=0 lpr=45918
pi=[45916,45918)/1 crt=45917'449808 lcod 45917'449807 mlcod 0'0 unknown
mbc={}] state<Start>: transitioning to Primary
2020-04-18 09:13:44.959 7fffd2e2b700 1 osd.354 pg_epoch: 45920
pg[1.43f( v 45917'449808 (45812'446725,45917'449808]
local-lis/les=45918/45919 n=30876 ec=874/871 lis/c 45918/45916 les/c/f
45919/45917/0 45920/45920/42988) [354,694,851] r=0 lpr=45920
pi=[45916,45920)/1 luod=0'0 crt=45917'449808 lcod 45917'449807 mlcod 0'0
active mbc={}] start_peering_interval up [354,851] -> [354,694,851],
acting [354,851] -> [354,694,851], acting_primary 354 -> 354, up_primary
354 -> 354, role 0 -> 0, features acting 4611087854031667199 upacting
4611087854031667199
2020-04-18 09:13:44.959 7fffd2e2b700 1 osd.354 pg_epoch: 45920
pg[1.43f( v 45917'449808 (45812'446725,45917'449808]
local-lis/les=45918/45919 n=30876 ec=874/871 lis/c 45918/45916 les/c/f
45919/45917/0 45920/45920/42988) [354,694,851] r=0 lpr=45920
pi=[45916,45920)/1 crt=45917'449808 lcod 45917'449807 mlcod 0'0 unknown
mbc={}] state<Start>: transitioning to Primary
I've tried restarting all 3 OSDs in question, it didn't help. I'm
running the upmap balancer on this cluster (if that matters). Any ideas?
Andras
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx