That's another interesting issue. Note that for part 12_80 the manifest specifies (I assume, by the messenger log) this part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14') whereas it seems that you do have the original part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80 (note the '2/...') The part that the manifest specifies does not exist, which makes me think that there is some weird upload sequence, something like: - client uploads part, upload finishes but client does not get ack for it - client retries (second upload) - client gets ack for the first upload and gives up on the second one But I'm not sure if it would explain the manifest, I'll need to take a look at the code. Could such a sequence happen with the client that you're using to upload? Yehuda ----- Original Message ----- > From: "Sean Sullivan" <seapasulli@xxxxxxxxxxxx> > To: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx> > Cc: ceph-users@xxxxxxxxxxxxxx > Sent: Wednesday, May 13, 2015 2:07:22 PM > Subject: Re: RGW - Can't download complete object > > Sorry for the delay. It took me a while to figure out how to do a range > request and append the data to a single file. The good news is that the end > file seems to be 14G in size which matches the files manifest size. The bad > news is that the file is completely corrupt and the radosgw log has errors. > I am using the following code to perform the download:: > > https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py > > Here is a clip of the log file:: > -- > 2015-05-11 15:28:52.313742 7f570db7d700 1 -- 10.64.64.126:0/1033338 <== > osd.11 10.64.64.101:6809/942707 5 ==== osd_op_reply(74566287 > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 > [read 0~858004] v0'0 uv41308 ondisk = 0) v6 ==== 304+0+858004 (1180387808 0 > 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240 > 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io > completion ofs=12934184960 len=858004 > 2015-05-11 15:28:52.372453 7f570db7d700 1 -- 10.64.64.126:0/1033338 <== > osd.45 10.64.64.101:6845/944590 2 ==== osd_op_reply(74566142 > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 > [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 ==== > 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30 > 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io > completion ofs=12145655808 len=4194304 > > 2015-05-11 15:28:52.372501 7f57067fc700 0 ERROR: got unexpected error when > trying to read object: -2 > > 2015-05-11 15:28:52.426079 7f570db7d700 1 -- 10.64.64.126:0/1033338 <== > osd.21 10.64.64.102:6856/1133473 16 ==== osd_op_reply(74566144 > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12 > [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 ==== 304+0+3671316 (1695485150 > 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0 > 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io > completion ofs=10786701312 len=3671316 > 2015-05-11 15:28:52.504072 7f570db7d700 1 -- 10.64.64.126:0/1033338 <== > osd.82 10.64.64.103:6857/88524 2 ==== osd_op_reply(74566283 > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8 > [read 0~4194304] v0'0 uv41566 ondisk = 0) v6 ==== 303+0+4194304 (1474509283 > 0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420 > 2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io > completion ofs=12917407744 len=4194304 > > I couldn't really find any good documentation on how fragments/files are > layed out on the object file system so I am not sure on where the file will > be. How could the 4mb object have issues but the cluster be completely > health okay? I did do the rados stat of each object inside ceph and they all > appear to be there:: > > http://paste.ubuntu.com/11118561/ > > The sum of all of the objects :: 14584887282 > The stat of the object inside ceph:: 14577056082 > > So for some reason I have more data in objects than the key manifest. We > easiliy identified this object via the same method as the other thread I > have:: > > for key in keys: > ....: if ( key.name == > 'b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam' > ): > ....: implicit = key.size > ....: explicit = conn.get_bucket(bucket).get_key(key.name).size > ....: absolute = abs(implicit - explicit) > ....: print key.name > ....: print implicit > ....: print explicit > ....: > > b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam > 14578628946 > 14577056082 > > So it looks like I have 3 different sizes. I figure this may be the network > issue that was mentioned in the other thread but seeing as this is not the > first 512k and the overalll size still matches as well as the errors I am > seeing in the gateway I feel that this may be a bigger issue. > > Has anyone seen this before? The only mention of the "got unexpected error > when trying to read object" is here > (http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-May/021688.html) > but my google skills are pretty poor. > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com