Re: RGW - Can't download complete object

Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> · Wed, 13 May 2015 17:33:07 -0400 (EDT)

That's another interesting issue. Note that for part 12_80 the manifest specifies (I assume, by the messenger log) this part:

default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
(note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')

whereas it seems that you do have the original part:
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80
(note the '2/...')

The part that the manifest specifies does not exist, which makes me think that there is some weird upload sequence, something like:

 - client uploads part, upload finishes but client does not get ack for it
 - client retries (second upload)
 - client gets ack for the first upload and gives up on the second one

But I'm not sure if it would explain the manifest, I'll need to take a look at the code. Could such a sequence happen with the client that you're using to upload?

Yehuda

----- Original Message -----
> From: "Sean Sullivan" <seapasulli@xxxxxxxxxxxx>
> To: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Sent: Wednesday, May 13, 2015 2:07:22 PM
> Subject: Re:  RGW - Can't download complete object
> 
> Sorry for the delay. It took me a while to figure out how to do a range
> request and append the data to a single file. The good news is that the end
> file seems to be 14G in size which matches the files manifest size. The bad
> news is that the file is completely corrupt and the radosgw log has errors.
> I am using the following code to perform the download::
> 
> https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py
> 
> Here is a clip of the log file::
> --
> 2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/1033338 <==
> osd.11 10.64.64.101:6809/942707 5 ==== osd_op_reply(74566287
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12
> [read 0~858004] v0'0 uv41308 ondisk = 0) v6 ==== 304+0+858004 (1180387808 0
> 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
> 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io
> completion ofs=12934184960 len=858004
> 2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/1033338 <==
> osd.45 10.64.64.101:6845/944590 2 ==== osd_op_reply(74566142
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
> [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 ====
> 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
> 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io
> completion ofs=12145655808 len=4194304
> 
> 2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error when
> trying to read object: -2
> 
> 2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/1033338 <==
> osd.21 10.64.64.102:6856/1133473 16 ==== osd_op_reply(74566144
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12
> [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 ==== 304+0+3671316 (1695485150
> 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
> 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io
> completion ofs=10786701312 len=3671316
> 2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/1033338 <==
> osd.82 10.64.64.103:6857/88524 2 ==== osd_op_reply(74566283
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8
> [read 0~4194304] v0'0 uv41566 ondisk = 0) v6 ==== 303+0+4194304 (1474509283
> 0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420
> 2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io
> completion ofs=12917407744 len=4194304
> 
> I couldn't really find any good documentation on how fragments/files are
> layed out on the object file system so I am not sure on where the file will
> be. How could the 4mb object have issues but the cluster be completely
> health okay? I did do the rados stat of each object inside ceph and they all
> appear to be there::
> 
> http://paste.ubuntu.com/11118561/
> 
> The sum of all of the objects :: 14584887282
> The stat of the object inside ceph:: 14577056082
> 
> So for some reason I have more data in objects than the key manifest. We
> easiliy identified this object via the same method as the other thread I
> have::
> 
> for key in keys:
>    ....:     if ( key.name ==
>    'b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam'
>    ):
>    ....:         implicit = key.size
>    ....:         explicit = conn.get_bucket(bucket).get_key(key.name).size
>    ....:         absolute = abs(implicit - explicit)
>    ....:         print key.name
>    ....:         print implicit
>    ....:         print explicit
>    ....:
> 
> b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam
> 14578628946
> 14577056082
> 
> So it looks like I have 3 different sizes. I figure this may be the network
> issue that was mentioned in the other thread but seeing as this is not the
> first 512k and the overalll size still matches as well as the errors I am
> seeing in the gateway I feel that this may be a bigger issue.
> 
> Has anyone seen this before?  The only mention of the "got unexpected error
> when trying to read object" is here
> (http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-May/021688.html)
> but my google skills are pretty poor.
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com