Re: RGW - Can't download complete object

Sean Sullivan <seapasulli@xxxxxxxxxxxx> · Wed, 13 May 2015 21:07:22 +0000

Sorry for the delay. It took me a while to figure out how to do a range request and append the data to a single file. The good news is that the end file seems to be 14G in size which matches the files manifest size. The bad news is that the file is completely corrupt and the radosgw log has errors. I am using the following code to perform the download::

https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py

Here is a clip of the log file::
--
2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/1033338 <== osd.11 10.64.64.101:6809/942707 5 ==== osd_op_reply(74566287 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 [read 0~858004] v0'0 uv41308 ondisk = 0) v6 ==== 304+0+858004 (1180387808 0 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12934184960 len=858004
2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/1033338 <== osd.45 10.64.64.101:6845/944590 2 ==== osd_op_reply(74566142 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 ==== 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12145655808 len=4194304

2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error when trying to read object: -2

2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/1033338 <== osd.21 10.64.64.102:6856/1133473 16 ==== osd_op_reply(74566144 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12 [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 ==== 304+0+3671316 (1695485150 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=10786701312 len=3671316
2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/1033338 <== osd.82 10.64.64.103:6857/88524 2 ==== osd_op_reply(74566283 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8 [read 0~4194304] v0'0 uv41566 ondisk = 0) v6 ==== 303+0+4194304 (1474509283 0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420
2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12917407744 len=4194304

I couldn't really find any good documentation on how fragments/files are layed out on the object file system so I am not sure on where the file will be. How could the 4mb object have issues but the cluster be completely health okay? I did do the rados stat of each object inside ceph and they all appear to be there::

http://paste.ubuntu.com/11118561/

The sum of all of the objects :: 14584887282
The stat of the object inside ceph:: 14577056082

So for some reason I have more data in objects than the key manifest. We easiliy identified this object via the same method as the other thread I have::

for key in keys:
   ....:     if ( key.name == 'b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam' ):
   ....:         implicit = key.size
   ....:         explicit = conn.get_bucket(bucket).get_key(key.name).size
   ....:         absolute = abs(implicit - explicit)
   ....:         print key.name
   ....:         print implicit
   ....:         print explicit
   ....:

b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam
14578628946
14577056082

So it looks like I have 3 different sizes. I figure this may be the network issue that was mentioned in the other thread but seeing as this is not the first 512k and the overalll size still matches as well as the errors I am seeing in the gateway I feel that this may be a bigger issue. 

Has anyone seen this before?  The only mention of the "got unexpected error when trying to read object" is here (http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-May/021688.html) but my google skills are pretty poor. 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com