Re: Understanding/correcting sudden onslaught of unfound objects

David Zafman <dzafman@xxxxxxxxxx> · Wed, 14 Mar 2018 11:16:22 -0700

The fix for tracker 20089 undid the changes you're seeing in the 15368 
pull request.  The attr name mismatch of 'hinfo_key'  means that key is 
missing because every erasure coded object should have a key called 
"hinfo_key."

You should try to determine why your extended attributes are getting 
corrupted.  All the errors are on shard 0.  My testing shows that repair 
will fix this scenario.

David

On 3/13/18 3:48 PM, Graham Allan wrote:
Updated cluster now to 12.2.4 and the cycle of 
inconsistent->repair->unfound seems to continue, though possibly 
slightly differently. A pg does pass through an "active+clean" phase 
after repair, which might be new, but more likely I never observed it 
at the right time before.

I see messages like this in the logs now "attr name mismatch 
'hinfo_key'" - perhaps this might cast more light on the cause:

2018-03-02 18:55:11.583850 osd.386 osd.386 10.31.0.72:6817/4057280 
401 : cluster [ERR] 70.3dbs0 : soid 
70:dbc6ed68:::default.325674.85_bellplants_images%2f1055211.jpg:head 
attr name mismatch 'hinfo_key'
2018-03-02 19:00:18.031929 osd.386 osd.386 10.31.0.72:6817/4057280 
428 : cluster [ERR] 70.3dbs0 : soid 
70:dbc97561:::default.325674.85_bellplants_images%2f1017818.jpg:head 
attr name mismatch 'hinfo_key'
2018-03-02 19:04:50.058477 osd.386 osd.386 10.31.0.72:6817/4057280 
452 : cluster [ERR] 70.3dbs0 : soid 
70:dbcbcb34:::default.325674.85_bellplants_images%2f1049756.jpg:head 
attr name mismatch 'hinfo_key'
2018-03-02 19:13:05.689136 osd.386 osd.386 10.31.0.72:6817/4057280 
494 : cluster [ERR] 70.3dbs0 : soid 
70:dbcfc7c9:::default.325674.85_bellplants_images%2f1021177.jpg:head 
attr name mismatch 'hinfo_key'
2018-03-02 19:13:30.883100 osd.386 osd.386 10.31.0.72:6817/4057280 
495 : cluster [ERR] 70.3dbs0 repair 0 missing, 161 inconsistent objects
2018-03-02 19:13:30.888259 osd.386 osd.386 10.31.0.72:6817/4057280 
496 : cluster [ERR] 70.3db repair 161 errors, 161 fixed

The only similar-sounding issue I could find is

http://tracker.ceph.com/issues/20089

When I look at src/osd/PGBackend.cc be_compare_scrubmaps in luminous, 
I don't see the changes in the commit here:

https://github.com/ceph/ceph/pull/15368/files

of course a lot of other things have changed, but is it possible this 
fix never made it into luminous?

Graham

On 02/17/2018 12:48 PM, David Zafman wrote:

The commits below came after v12.2.2 and may impact this issue. When 
a pg is active+clean+inconsistent means that scrub has detected 
issues with 1 or more replicas of 1 or more objects . An unfound 
object is a potentially temporary state in which the current set of 
available OSDs doesn't allow an object to be 
recovered/backfilled/repaired.  When the primary OSD restarts, any 
unfound objects ( an in memory structure) are reset so that the new 
set of peered OSDs can determine again what objects are unfound.

I'm not clear in this scenario whether recovery failed to start, 
recovery hung before due to a bug or if recovery stopped (as 
designed) because of the unfound object.  The new recovery_unfound 
and backfill_unfound states indicates that recovery has stopped due 
to unfound objects.

commit 64047e1bac2e775a06423a03cfab69b88462538c
Author: David Zafman <dzafman@xxxxxxxxxx>
Date:   Wed Jan 10 13:30:41 2018 -0800

     osd: Don't start recovery for missing until active pg state set

     I was seeing recovery hang when it is started before 
_activate_committed()
     The state machine passes into "Active" but this transitions to 
activating
     pg state and after commmitted into "active" pg state.

     Signed-off-by: David Zafman <dzafman@xxxxxxxxxx>

commit 7f8b0ce9e681f727d8217e3ed74a1a3355f364f3
Author: David Zafman <dzafman@xxxxxxxxxx>
Date:   Mon Oct 9 08:19:21 2017 -0700

     osd, mon: Add new pg states recovery_unfound and backfill_unfound

     Signed-off-by: David Zafman <dzafman@xxxxxxxxxx>

On 2/16/18 1:40 PM, Gregory Farnum wrote:
On Fri, Feb 16, 2018 at 12:17 PM Graham Allan <gta@xxxxxxx> wrote:

On 02/16/2018 12:31 PM, Graham Allan wrote:
If I set debug rgw=1 and demug ms=1 before running the "object stat"
command, it seems to stall in a loop of trying communicate with 
osds for
pool 96, which is .rgw.control

10.32.16.93:0/2689814946 --> 10.31.0.68:6818/8969 --
osd_op(unknown.0.0:541 96.e 96:7759931f:::notify.3:head [watch ping
cookie 139709246356176] snapc 0=[] ondisk+write+known_if_redirected
e507695) v8 -- 0x7f10ac033610 con 0
10.32.16.93:0/2689814946 <== osd.38 10.31.0.68:6818/8969 59 ====
osd_op_reply(541 notify.3 [watch ping cookie 139709246356176] v0'0
uv3933745 ondisk = 0) v8 ==== 152+0+0 (2536111836 
<(253)%20611-1836> 0
0) 0x7f1158003e20
con 0x7f117afd8390
Prior to that, probably more relevant, this was the only 
communication
logged with the primary osd of the pg:

10.32.16.93:0/1552085932 --> 10.31.0.71:6838/66301 --
osd_op(unknown.0.0:96 70.438s0
70:1c20c157:::default.325674.85_bellplants_images%2f1042066.jpg:head
[getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected e507695)
v8 -- 0x7fab79889fa0 con 0
10.32.16.93:0/1552085932 <== osd.175 10.31.0.71:6838/66301 1 ====
osd_backoff(70.438s0 block id 1

[70:1c20c157:::default.325674.85_bellplants_images%2f1042066.jpg:head,70:1c20c157:::default.325674.85_bellplants_images%2f1042066.jpg:head) 

e507695) v1 ==== 209+0+0 (1958971312 0 0) 0x7fab5003d3c0 con
0x7fab79885980
210.32.16.93:0/1552085932 --> 10.31.0.71:6838/66301 --
osd_backoff(70.438s0 ack-block id 1

[70:1c20c157:::default.325674.85_bellplants_images%2f1042066.jpg:head,70:1c20c157:::default.325674.85_bellplants_images%2f1042066.jpg:head) 

e507695) v1 -- 0x7fab48065420 con 0
so I guess the backoff message above is saying the object is
unavailable. OK, that certainly makes sense. Not sure that it 
helps me
understand how to fix the inconsistencies....
If I restart the primary osd for the pg, that makes it forget its 
state
and return to active+clean+inconsistent. I can then download the
previously-unfound objects again, as well as run "radosgw-admin object
stat".

So the interesting bit is probably figuring out why it decides these
objects are unfound, when clearly they aren't.

What would be the best place to enable additional logging to 
understand
this - perhaps the primary osd?

David, this sounds like one of the bugs where an OSD can mark 
objects as
inconsistent locally but then doesn't actually trigger recovery on 
them. Or
it doesn't like any copy but doesn't persist that.
Do any known issues around that apply to 12.2.2?
-Greg

Thanks for all your help,

Graham
--
Graham Allan
Minnesota Supercomputing Institute - gta@xxxxxxx

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com