Ok, I dug a bit more, and it seems to me that the problem is with the manifest that was created. I was able to reproduce a similar issue (opened ceph bug #11622), for which I also have a fix. I created new tests to cover this issue, and we'll get those recent fixes as soon as we can, after we test for any regressions. Thanks, Yehuda ----- Original Message ----- > From: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx> > To: "Sean Sullivan" <seapasulli@xxxxxxxxxxxx> > Cc: ceph-users@xxxxxxxxxxxxxx > Sent: Wednesday, May 13, 2015 2:33:07 PM > Subject: Re: RGW - Can't download complete object > > That's another interesting issue. Note that for part 12_80 the manifest > specifies (I assume, by the messenger log) this part: > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 > (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14') > > whereas it seems that you do have the original part: > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80 > (note the '2/...') > > The part that the manifest specifies does not exist, which makes me think > that there is some weird upload sequence, something like: > > - client uploads part, upload finishes but client does not get ack for it > - client retries (second upload) > - client gets ack for the first upload and gives up on the second one > > But I'm not sure if it would explain the manifest, I'll need to take a look > at the code. Could such a sequence happen with the client that you're using > to upload? > > Yehuda > > ----- Original Message ----- > > From: "Sean Sullivan" <seapasulli@xxxxxxxxxxxx> > > To: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx> > > Cc: ceph-users@xxxxxxxxxxxxxx > > Sent: Wednesday, May 13, 2015 2:07:22 PM > > Subject: Re: RGW - Can't download complete object > > > > Sorry for the delay. It took me a while to figure out how to do a range > > request and append the data to a single file. The good news is that the end > > file seems to be 14G in size which matches the files manifest size. The bad > > news is that the file is completely corrupt and the radosgw log has errors. > > I am using the following code to perform the download:: > > > > https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py > > > > Here is a clip of the log file:: > > -- > > 2015-05-11 15:28:52.313742 7f570db7d700 1 -- 10.64.64.126:0/1033338 <== > > osd.11 10.64.64.101:6809/942707 5 ==== osd_op_reply(74566287 > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 > > [read 0~858004] v0'0 uv41308 ondisk = 0) v6 ==== 304+0+858004 (1180387808 0 > > 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240 > > 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io > > completion ofs=12934184960 len=858004 > > 2015-05-11 15:28:52.372453 7f570db7d700 1 -- 10.64.64.126:0/1033338 <== > > osd.45 10.64.64.101:6845/944590 2 ==== osd_op_reply(74566142 > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 > > [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 ==== > > 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30 > > 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io > > completion ofs=12145655808 len=4194304 > > > > 2015-05-11 15:28:52.372501 7f57067fc700 0 ERROR: got unexpected error when > > trying to read object: -2 > > > > 2015-05-11 15:28:52.426079 7f570db7d700 1 -- 10.64.64.126:0/1033338 <== > > osd.21 10.64.64.102:6856/1133473 16 ==== osd_op_reply(74566144 > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12 > > [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 ==== 304+0+3671316 (1695485150 > > 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0 > > 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io > > completion ofs=10786701312 len=3671316 > > 2015-05-11 15:28:52.504072 7f570db7d700 1 -- 10.64.64.126:0/1033338 <== > > osd.82 10.64.64.103:6857/88524 2 ==== osd_op_reply(74566283 > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8 > > [read 0~4194304] v0'0 uv41566 ondisk = 0) v6 ==== 303+0+4194304 (1474509283 > > 0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420 > > 2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io > > completion ofs=12917407744 len=4194304 > > > > I couldn't really find any good documentation on how fragments/files are > > layed out on the object file system so I am not sure on where the file will > > be. How could the 4mb object have issues but the cluster be completely > > health okay? I did do the rados stat of each object inside ceph and they > > all > > appear to be there:: > > > > http://paste.ubuntu.com/11118561/ > > > > The sum of all of the objects :: 14584887282 > > The stat of the object inside ceph:: 14577056082 > > > > So for some reason I have more data in objects than the key manifest. We > > easiliy identified this object via the same method as the other thread I > > have:: > > > > for key in keys: > > ....: if ( key.name == > > 'b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam' > > ): > > ....: implicit = key.size > > ....: explicit = conn.get_bucket(bucket).get_key(key.name).size > > ....: absolute = abs(implicit - explicit) > > ....: print key.name > > ....: print implicit > > ....: print explicit > > ....: > > > > b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam > > 14578628946 > > 14577056082 > > > > So it looks like I have 3 different sizes. I figure this may be the network > > issue that was mentioned in the other thread but seeing as this is not the > > first 512k and the overalll size still matches as well as the errors I am > > seeing in the gateway I feel that this may be a bigger issue. > > > > Has anyone seen this before? The only mention of the "got unexpected error > > when trying to read object" is here > > (http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-May/021688.html) > > but my google skills are pretty poor. > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com