Re: Understanding/correcting sudden onslaught of unfound objects

Graham Allan <gta@xxxxxxx> · Thu, 15 Feb 2018 17:10:50 -0600

On 02/15/2018 11:58 AM, Gregory Farnum wrote:

Well, if the objects were uploaded using multi-part upload I believe the 
objects you’re looking at here will only contain omap (or xattr?) 
entries pointing to the part files, so the empty file data is to be 
expected. This might also make slightly more sense in terms of the scrub 
inconsistencies popping up, although I didn’t think any omap issues I 
remember should have impacted rgw.

Other than that, I’m not sure how it would be turning 0 bytes of data 
into the correct results.

That makes a lot of sense! So the 0-byte file here is effectivelyjust a 
holder for xattr data.

Now I'm trying to figure out how I can associate it to files which might 
contain the data.

I don't see a lot of data in omap (assuming I'm looking at the right 
thing here):

root@cephmon1:~# rados -p .rgw.buckets.index getomapval .dir.default.325674.85 bellplants_images/1089213.jpg
value (193 bytes) :
00000000  05 03 bb 00 00 00 1d 00  00 00 62 65 6c 6c 70 6c  |..........bellpl|
00000010  61 6e 74 73 5f 69 6d 61  67 65 73 2f 31 30 38 39  |ants_images/1089|
00000020  32 31 33 2e 6a 70 67 71  70 1f 00 00 00 00 00 01  |213.jpgqp.......|
00000030  03 03 3a 00 00 00 01 00  a4 92 00 00 00 00 00 10  |..:.............|
00000040  40 c5 55 00 00 00 00 00  00 00 00 08 00 00 00 74  |@.U............t|
00000050  70 72 61 74 68 65 72 11  00 00 00 54 68 6f 6d 61  |prather....Thoma|
00000060  73 20 46 2e 20 50 72 61  74 68 65 72 00 00 00 00  |s F. Prather....|
00000070  00 00 00 00 1d 00 00 00  62 65 6c 6c 70 6c 61 6e  |........bellplan|
00000080  74 73 5f 69 6d 61 67 65  73 2f 31 30 38 39 32 31  |ts_images/108921|
00000090  33 2e 6a 70 67 01 01 06  00 00 00 46 84 71 70 1f  |3.jpg......F.qp.|
000000a0  00 84 14 ee 09 00 17 00  00 00 64 65 66 61 75 6c  |..........defaul|
000000b0  74 2e 33 32 35 36 37 34  2e 31 31 35 39 33 36 33  |t.325674.1159363|
000000c0  37                                                |7|
000000c1

A lot more in xattrs which I won't paste, though the keys are:

root@cephmon1:~# ssh ceph03 find /var/lib/ceph/osd/ceph-295/current/70.3d6s0_head -name '*1089213*' -exec xattr {} +
user.ceph._user.rgw.idtag
user.cephos.spill_out
user.ceph._
user.ceph.snapset
user.ceph._user.rgw.manifest
user.ceph._@1
user.ceph.hinfo_key
user.ceph._user.rgw.manifest@1
user.ceph._user.rgw.manifest@2
user.ceph._user.rgw.acl
user.ceph._user.rgw.x-amz-acl
user.ceph._user.rgw.etag
user.ceph._user.rgw.x-amz-date
user.ceph._user.rgw.content_type

Not sure which among these would contain pointers to part files.

While I'm less pessimistic about data being lost I'm still really 
uncertain that the cluster is making progress towards a clean state.

One example, yesterday pg 70.438 showed "has 169 objects unfound and 
apparently lost". At that time the state was 
active+recovery_wait+inconsistent. Today it's showing no unfound objects 
but is active+clean+inconsistent, and objects which were inaccessible 
via radosgw yesterday can now download. I'm not sure what changed. I 
have asked ceph to perform another deep scrub and repair on the pg, but 
it has yet to start. I'm really curious to see if it becomes consistent, 
or discovers unfound objects again.

Actually now I notice that a pg reported as 
active+recovery_wait+inconsistent by "ceph health detail" is shown as 
active+recovering+inconsistent by "ceph pg list". That makes more sense 
to me - "recovery_wait" implied to me that it was waiting for recovery 
to start, while "recovering" explains why the problem might clear itself.

Graham
--
Graham Allan
Minnesota Supercomputing Institute - gta@xxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com