We'll need to confirm that this is the actual issue, and then have it fixed. It would be nice to have some kind of a unitest that reproduces it. Yehuda On Wed, Jan 20, 2016 at 1:34 PM, seapasulli@xxxxxxxxxxxx <seapasulli@xxxxxxxxxxxx> wrote: > So is there any way to prevent this from happening going forward? I mean > ideally this should never be possible, right? Even with a complete object > that is 0 bytes it should be downloaded as 0 bytes and have a different > md5sum and not report as 7mb? > > > > On 1/20/16 1:30 PM, Yehuda Sadeh-Weinraub wrote: >> >> On Wed, Jan 20, 2016 at 10:43 AM, seapasulli@xxxxxxxxxxxx >> <seapasulli@xxxxxxxxxxxx> wrote: >>> >>> >>> On 1/19/16 4:00 PM, Yehuda Sadeh-Weinraub wrote: >>>> >>>> On Fri, Jan 15, 2016 at 5:04 PM, seapasulli@xxxxxxxxxxxx >>>> <seapasulli@xxxxxxxxxxxx> wrote: >>>>> >>>>> I have looked all over and I do not see any explicit mention of >>>>> "NWS_NEXRAD_NXL2DP_PAKC_20150101110000_20150101115959" in the logs nor >>>>> do >>>>> I >>>>> see a timestamp from November 4th although I do see log rotations >>>>> dating >>>>> back to october 15th. I don't think it's possible it wasn't logged so I >>>>> am >>>>> going through the bucket logs from the 'radosgw-admin log show >>>>> --object' >>>>> side and I found the following:: >>>>> >>>>> 4604932 { >>>>> 4604933 "bucket": "noaa-nexrad-l2", >>>>> 4604934 "time": "2015-11-04 21:29:27.346509Z", >>>>> 4604935 "time_local": "2015-11-04 15:29:27.346509", >>>>> 4604936 "remote_addr": "", >>>>> 4604937 "object_owner": "b05f707271774dbd89674a0736c9406e", >>>>> 4604938 "user": "b05f707271774dbd89674a0736c9406e", >>>>> 4604939 "operation": "PUT", >>>> >>>> I'd expect a multipart upload completion to be done with a POST, not a >>>> PUT. >>> >>> Indeed it seems really weird. >>>> >>>> >>>>> 4604940 "uri": >>>>> >>>>> >>>>> "\/noaa-nexrad-l2\/2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_20150101110000_20150101115959.tar", >>>>> 4604941 "http_status": "200", >>>>> 4604942 "error_code": "", >>>>> 4604943 "bytes_sent": 19, >>>>> 4604944 "bytes_received": 0, >>>>> 4604945 "object_size": 0, >>>> >>>> Do you see a zero object_size for other multipart uploads? >>> >>> I think so. I still don't know how to tell for certain if a radosgw >>> object >>> is a multipart object or not. I think all of the objects in >>> noaa-nexrad-l2 >>> bucket are multipart:: >>> >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- { >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bucket": >>> "noaa-nexrad-l2", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time": "2015-10-16 >>> 19:49:30.579738Z", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time_local": >>> "2015-10-16 14:49:30.579738", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "remote_addr": "", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user": >>> "b05f707271774dbd89674a0736c9406e", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out: "operation": "POST", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "uri": >>> >>> "\/noaa-nexrad-l2\/2015\/01\/13\/KGRK\/NWS_NEXRAD_NXL2DP_KGRK_20150113040000_20150113045959.tar", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "http_status": >>> "200", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "error_code": "", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_sent": 331, >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_received": >>> 152, >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "object_size": 0, >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "total_time": 0, >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user_agent": >>> "Boto\/2.38.0 Python\/2.7.7 Linux\/2.6.32-573.7.1.el6.x86_64", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "referrer": "" >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- } >>> >>> The objects above >>> (NWS_NEXRAD_NXL2DP_KGRK_20150113040000_20150113045959.tar) >>> pulls down without an issue though. Below is a paste for object >>> "NWS_NEXRAD_NXL2DP_KVBX_20150225160000_20150225165959.tar" which 404's:: >>> http://pastebin.com/Jtw8z7G4 >> >> Sadly the log doesn't provide all the input, but I can guess what the >> operations were: >> >> - POST (init multipart upload) >> - PUT (upload part) >> - GET (list parts) >> - POST (complete multipart) <-- took > 57 seconds to process >> - POST (complete multipart) >> - HEAD (stat object) >> >> For some reason the complete multipart operation took too long, which >> I think triggered a client retry (either that, or an abort). Then >> there were two completions racing (or a complete and abort), which >> might have caused the issue we're seeing for some reason. E.g., two >> completions might have ended up with the second completion noticing >> that it's overwriting an existing object (that we just created), >> sending the 'old' object to be garbage collected, when that object's >> tail is actually its own tail. >> >> >>> I see two posts one recorded a minute before for this object both with 0 >>> size though. Does this help at all? >> >> Yes, very much >> >> Thanks, >> Yehuda > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com