On Thu, Feb 15, 2018 at 3:10 PM Graham Allan <gta@xxxxxxx> wrote:
On 02/15/2018 11:58 AM, Gregory Farnum wrote:
>
> Well, if the objects were uploaded using multi-part upload I believe the
> objects you’re looking at here will only contain omap (or xattr?)
> entries pointing to the part files, so the empty file data is to be
> expected. This might also make slightly more sense in terms of the scrub
> inconsistencies popping up, although I didn’t think any omap issues I
> remember should have impacted rgw.
>
> Other than that, I’m not sure how it would be turning 0 bytes of data
> into the correct results.
That makes a lot of sense! So the 0-byte file here is effectivelyjust a
holder for xattr data.
Now I'm trying to figure out how I can associate it to files which might
contain the data.
I don't see a lot of data in omap (assuming I'm looking at the right
thing here):
> root@cephmon1:~# rados -p .rgw.buckets.index getomapval .dir.default.325674.85 bellplants_images/1089213.jpg
> value (193 bytes) :
> 00000000 05 03 bb 00 00 00 1d 00 00 00 62 65 6c 6c 70 6c |..........bellpl|
> 00000010 61 6e 74 73 5f 69 6d 61 67 65 73 2f 31 30 38 39 |ants_images/1089|
> 00000020 32 31 33 2e 6a 70 67 71 70 1f 00 00 00 00 00 01 |213.jpgqp.......|
> 00000030 03 03 3a 00 00 00 01 00 a4 92 00 00 00 00 00 10 |..:.............|
> 00000040 40 c5 55 00 00 00 00 00 00 00 00 08 00 00 00 74 |@.U............t|
> 00000050 70 72 61 74 68 65 72 11 00 00 00 54 68 6f 6d 61 |prather....Thoma|
> 00000060 73 20 46 2e 20 50 72 61 74 68 65 72 00 00 00 00 |s F. Prather....|
> 00000070 00 00 00 00 1d 00 00 00 62 65 6c 6c 70 6c 61 6e |........bellplan|
> 00000080 74 73 5f 69 6d 61 67 65 73 2f 31 30 38 39 32 31 |ts_images/108921|
> 00000090 33 2e 6a 70 67 01 01 06 00 00 00 46 84 71 70 1f |3.jpg......F.qp.|
> 000000a0 00 84 14 ee 09 00 17 00 00 00 64 65 66 61 75 6c |..........defaul|
> 000000b0 74 2e 33 32 35 36 37 34 2e 31 31 35 39 33 36 33 |t.325674.1159363|
> 000000c0 37 |7|
> 000000c1
A lot more in xattrs which I won't paste, though the keys are:
> root@cephmon1:~# ssh ceph03 find /var/lib/ceph/osd/ceph-295/current/70.3d6s0_head -name '*1089213*' -exec xattr {} +
> user.ceph._user.rgw.idtag
> user.cephos.spill_out
> user.ceph._
> user.ceph.snapset
> user.ceph._user.rgw.manifest
> user.ceph._@1
> user.ceph.hinfo_key
> user.ceph._user.rgw.manifest@1
> user.ceph._user.rgw.manifest@2
> user.ceph._user.rgw.acl
> user.ceph._user.rgw.x-amz-acl
> user.ceph._user.rgw.etag
> user.ceph._user.rgw.x-amz-date
> user.ceph._user.rgw.content_type
Not sure which among these would contain pointers to part files.
I believe it’s the manifest xattrs there.
While I'm less pessimistic about data being lost I'm still really
uncertain that the cluster is making progress towards a clean state.
One example, yesterday pg 70.438 showed "has 169 objects unfound and
apparently lost". At that time the state was
active+recovery_wait+inconsistent. Today it's showing no unfound objects
but is active+clean+inconsistent, and objects which were inaccessible
via radosgw yesterday can now download. I'm not sure what changed. I
have asked ceph to perform another deep scrub and repair on the pg, but
it has yet to start. I'm really curious to see if it becomes consistent,
or discovers unfound objects again.
Actually now I notice that a pg reported as
active+recovery_wait+inconsistent by "ceph health detail" is shown as
active+recovering+inconsistent by "ceph pg list". That makes more sense
to me - "recovery_wait" implied to me that it was waiting for recovery
to start, while "recovering" explains why the problem might clear itself.
Right, “recovery_wait” means that the pg needs to do log-based recovery but (at least) one of the participating OSDs doesn’t have a slot available; that will resolve itself eventually.
It sounds like the scrubbing has detected some inconsistencies but the reason you weren’t getting data is just that it hit an object which needed recovery but was blocked waiting on it.
-Greg
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com