Keep in mind that if the problem is that the tail is being sent to garbage collection, you'll only see the 404 after a few hours. A shorter way to check it would be by listing the gc entries (with --include-all). Yehuda On Wed, Jan 20, 2016 at 1:52 PM, seapasulli@xxxxxxxxxxxx <seapasulli@xxxxxxxxxxxx> wrote: > I'm working on getting the code they used and trying different timeouts in > my multipart upload code. Right now I have not created any new 404 keys > though :-( > > > On 1/20/16 3:44 PM, Yehuda Sadeh-Weinraub wrote: >> >> We'll need to confirm that this is the actual issue, and then have it >> fixed. It would be nice to have some kind of a unitest that reproduces >> it. >> >> Yehuda >> >> On Wed, Jan 20, 2016 at 1:34 PM, seapasulli@xxxxxxxxxxxx >> <seapasulli@xxxxxxxxxxxx> wrote: >>> >>> So is there any way to prevent this from happening going forward? I mean >>> ideally this should never be possible, right? Even with a complete object >>> that is 0 bytes it should be downloaded as 0 bytes and have a different >>> md5sum and not report as 7mb? >>> >>> >>> >>> On 1/20/16 1:30 PM, Yehuda Sadeh-Weinraub wrote: >>>> >>>> On Wed, Jan 20, 2016 at 10:43 AM, seapasulli@xxxxxxxxxxxx >>>> <seapasulli@xxxxxxxxxxxx> wrote: >>>>> >>>>> >>>>> On 1/19/16 4:00 PM, Yehuda Sadeh-Weinraub wrote: >>>>>> >>>>>> On Fri, Jan 15, 2016 at 5:04 PM, seapasulli@xxxxxxxxxxxx >>>>>> <seapasulli@xxxxxxxxxxxx> wrote: >>>>>>> >>>>>>> I have looked all over and I do not see any explicit mention of >>>>>>> "NWS_NEXRAD_NXL2DP_PAKC_20150101110000_20150101115959" in the logs >>>>>>> nor >>>>>>> do >>>>>>> I >>>>>>> see a timestamp from November 4th although I do see log rotations >>>>>>> dating >>>>>>> back to october 15th. I don't think it's possible it wasn't logged so >>>>>>> I >>>>>>> am >>>>>>> going through the bucket logs from the 'radosgw-admin log show >>>>>>> --object' >>>>>>> side and I found the following:: >>>>>>> >>>>>>> 4604932 { >>>>>>> 4604933 "bucket": "noaa-nexrad-l2", >>>>>>> 4604934 "time": "2015-11-04 21:29:27.346509Z", >>>>>>> 4604935 "time_local": "2015-11-04 15:29:27.346509", >>>>>>> 4604936 "remote_addr": "", >>>>>>> 4604937 "object_owner": >>>>>>> "b05f707271774dbd89674a0736c9406e", >>>>>>> 4604938 "user": "b05f707271774dbd89674a0736c9406e", >>>>>>> 4604939 "operation": "PUT", >>>>>> >>>>>> I'd expect a multipart upload completion to be done with a POST, not a >>>>>> PUT. >>>>> >>>>> Indeed it seems really weird. >>>>>> >>>>>> >>>>>>> 4604940 "uri": >>>>>>> >>>>>>> >>>>>>> >>>>>>> "\/noaa-nexrad-l2\/2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_20150101110000_20150101115959.tar", >>>>>>> 4604941 "http_status": "200", >>>>>>> 4604942 "error_code": "", >>>>>>> 4604943 "bytes_sent": 19, >>>>>>> 4604944 "bytes_received": 0, >>>>>>> 4604945 "object_size": 0, >>>>>> >>>>>> Do you see a zero object_size for other multipart uploads? >>>>> >>>>> I think so. I still don't know how to tell for certain if a radosgw >>>>> object >>>>> is a multipart object or not. I think all of the objects in >>>>> noaa-nexrad-l2 >>>>> bucket are multipart:: >>>>> >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- { >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bucket": >>>>> "noaa-nexrad-l2", >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time": >>>>> "2015-10-16 >>>>> 19:49:30.579738Z", >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time_local": >>>>> "2015-10-16 14:49:30.579738", >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "remote_addr": "", >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user": >>>>> "b05f707271774dbd89674a0736c9406e", >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out: "operation": >>>>> "POST", >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "uri": >>>>> >>>>> >>>>> "\/noaa-nexrad-l2\/2015\/01\/13\/KGRK\/NWS_NEXRAD_NXL2DP_KGRK_20150113040000_20150113045959.tar", >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "http_status": >>>>> "200", >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "error_code": "", >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_sent": 331, >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_received": >>>>> 152, >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "object_size": 0, >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "total_time": 0, >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user_agent": >>>>> "Boto\/2.38.0 Python\/2.7.7 Linux\/2.6.32-573.7.1.el6.x86_64", >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "referrer": "" >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- } >>>>> >>>>> The objects above >>>>> (NWS_NEXRAD_NXL2DP_KGRK_20150113040000_20150113045959.tar) >>>>> pulls down without an issue though. Below is a paste for object >>>>> "NWS_NEXRAD_NXL2DP_KVBX_20150225160000_20150225165959.tar" which >>>>> 404's:: >>>>> http://pastebin.com/Jtw8z7G4 >>>> >>>> Sadly the log doesn't provide all the input, but I can guess what the >>>> operations were: >>>> >>>> - POST (init multipart upload) >>>> - PUT (upload part) >>>> - GET (list parts) >>>> - POST (complete multipart) <-- took > 57 seconds to process >>>> - POST (complete multipart) >>>> - HEAD (stat object) >>>> >>>> For some reason the complete multipart operation took too long, which >>>> I think triggered a client retry (either that, or an abort). Then >>>> there were two completions racing (or a complete and abort), which >>>> might have caused the issue we're seeing for some reason. E.g., two >>>> completions might have ended up with the second completion noticing >>>> that it's overwriting an existing object (that we just created), >>>> sending the 'old' object to be garbage collected, when that object's >>>> tail is actually its own tail. >>>> >>>> >>>>> I see two posts one recorded a minute before for this object both with >>>>> 0 >>>>> size though. Does this help at all? >>>> >>>> Yes, very much >>>> >>>> Thanks, >>>> Yehuda >>> >>> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com