Re: Understanding/correcting sudden onslaught of unfound objects

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 15 Feb 2018 23:33:07 +0000

On Thu, Feb 15, 2018 at 3:10 PM Graham Allan <gta@xxxxxxx> wrote:
On 02/15/2018 11:58 AM, Gregory Farnum wrote:

>

> Well, if the objects were uploaded using multi-part upload I believe the

> objects you’re looking at here will only contain omap (or xattr?)

> entries pointing to the part files, so the empty file data is to be

> expected. This might also make slightly more sense in terms of the scrub

> inconsistencies popping up, although I didn’t think any omap issues I

> remember should have impacted rgw.

>

> Other than that, I’m not sure how it would be turning 0 bytes of data

> into the correct results.

That makes a lot of sense! So the 0-byte file here is effectivelyjust a

holder for xattr data.

Now I'm trying to figure out how I can associate it to files which might

contain the data.

I don't see a lot of data in omap (assuming I'm looking at the right

thing here):

> root@cephmon1:~# rados -p .rgw.buckets.index getomapval .dir.default.325674.85 bellplants_images/1089213.jpg

> value (193 bytes) :

> 00000000  05 03 bb 00 00 00 1d 00  00 00 62 65 6c 6c 70 6c  |..........bellpl|

> 00000010  61 6e 74 73 5f 69 6d 61  67 65 73 2f 31 30 38 39  |ants_images/1089|

> 00000020  32 31 33 2e 6a 70 67 71  70 1f 00 00 00 00 00 01  |213.jpgqp.......|

> 00000030  03 03 3a 00 00 00 01 00  a4 92 00 00 00 00 00 10  |..:.............|

> 00000040  40 c5 55 00 00 00 00 00  00 00 00 08 00 00 00 74  |@.U............t|

> 00000050  70 72 61 74 68 65 72 11  00 00 00 54 68 6f 6d 61  |prather....Thoma|

> 00000060  73 20 46 2e 20 50 72 61  74 68 65 72 00 00 00 00  |s F. Prather....|

> 00000070  00 00 00 00 1d 00 00 00  62 65 6c 6c 70 6c 61 6e  |........bellplan|

> 00000080  74 73 5f 69 6d 61 67 65  73 2f 31 30 38 39 32 31  |ts_images/108921|

> 00000090  33 2e 6a 70 67 01 01 06  00 00 00 46 84 71 70 1f  |3.jpg......F.qp.|

> 000000a0  00 84 14 ee 09 00 17 00  00 00 64 65 66 61 75 6c  |..........defaul|

> 000000b0  74 2e 33 32 35 36 37 34  2e 31 31 35 39 33 36 33  |t.325674.1159363|

> 000000c0  37                                                |7|

> 000000c1

A lot more in xattrs which I won't paste, though the keys are:

> root@cephmon1:~# ssh ceph03 find /var/lib/ceph/osd/ceph-295/current/70.3d6s0_head -name '*1089213*' -exec xattr {} +

> user.ceph._user.rgw.idtag

> user.cephos.spill_out

> user.ceph._

> user.ceph.snapset

> user.ceph._user.rgw.manifest

> user.ceph._@1

> user.ceph.hinfo_key

> user.ceph._user.rgw.manifest@1

> user.ceph._user.rgw.manifest@2

> user.ceph._user.rgw.acl

> user.ceph._user.rgw.x-amz-acl

> user.ceph._user.rgw.etag

> user.ceph._user.rgw.x-amz-date

> user.ceph._user.rgw.content_type

Not sure which among these would contain pointers to part files.

I believe it’s the manifest xattrs there.

While I'm less pessimistic about data being lost I'm still really

uncertain that the cluster is making progress towards a clean state.

One example, yesterday pg 70.438 showed "has 169 objects unfound and

apparently lost". At that time the state was

active+recovery_wait+inconsistent. Today it's showing no unfound objects

but is active+clean+inconsistent, and objects which were inaccessible

via radosgw yesterday can now download. I'm not sure what changed. I

have asked ceph to perform another deep scrub and repair on the pg, but

it has yet to start. I'm really curious to see if it becomes consistent,

or discovers unfound objects again.

Actually now I notice that a pg reported as

active+recovery_wait+inconsistent by "ceph health detail" is shown as

active+recovering+inconsistent by "ceph pg list". That makes more sense

to me - "recovery_wait" implied to me that it was waiting for recovery

to start, while "recovering" explains why the problem might clear itself.

Right, “recovery_wait” means that the pg needs to do log-based recovery but (at least) one of the participating OSDs doesn’t have a slot available; that will resolve itself eventually.

It sounds like the scrubbing has detected some inconsistencies but the reason you weren’t getting data is just that it hit an object which needed recovery but was blocked waiting on it.
-Greg

Graham

--

Graham Allan

Minnesota Supercomputing Institute - gta@xxxxxxx

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com