Re: RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Keep in mind that if the problem is that the tail is being sent to
garbage collection, you'll only see the 404 after a few hours. A
shorter way to check it would be by listing the gc entries (with
--include-all).

Yehuda

On Wed, Jan 20, 2016 at 1:52 PM, seapasulli@xxxxxxxxxxxx
<seapasulli@xxxxxxxxxxxx> wrote:
> I'm working on getting the code they used and trying different timeouts in
> my multipart upload code. Right now I have not created any new 404 keys
> though :-(
>
>
> On 1/20/16 3:44 PM, Yehuda Sadeh-Weinraub wrote:
>>
>> We'll need to confirm that this is the actual issue, and then have it
>> fixed. It would be nice to have some kind of a unitest that reproduces
>> it.
>>
>> Yehuda
>>
>> On Wed, Jan 20, 2016 at 1:34 PM, seapasulli@xxxxxxxxxxxx
>> <seapasulli@xxxxxxxxxxxx> wrote:
>>>
>>> So is there any way to prevent this from happening going forward? I mean
>>> ideally this should never be possible, right? Even with a complete object
>>> that is 0 bytes it should be downloaded as 0 bytes and have a different
>>> md5sum and not report as 7mb?
>>>
>>>
>>>
>>> On 1/20/16 1:30 PM, Yehuda Sadeh-Weinraub wrote:
>>>>
>>>> On Wed, Jan 20, 2016 at 10:43 AM, seapasulli@xxxxxxxxxxxx
>>>> <seapasulli@xxxxxxxxxxxx> wrote:
>>>>>
>>>>>
>>>>> On 1/19/16 4:00 PM, Yehuda Sadeh-Weinraub wrote:
>>>>>>
>>>>>> On Fri, Jan 15, 2016 at 5:04 PM, seapasulli@xxxxxxxxxxxx
>>>>>> <seapasulli@xxxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>> I have looked all over and I do not see any explicit mention of
>>>>>>> "NWS_NEXRAD_NXL2DP_PAKC_20150101110000_20150101115959" in the logs
>>>>>>> nor
>>>>>>> do
>>>>>>> I
>>>>>>> see a timestamp from November 4th although I do see log rotations
>>>>>>> dating
>>>>>>> back to october 15th. I don't think it's possible it wasn't logged so
>>>>>>> I
>>>>>>> am
>>>>>>> going through the bucket logs from the 'radosgw-admin log show
>>>>>>> --object'
>>>>>>> side and I found the following::
>>>>>>>
>>>>>>> 4604932         {
>>>>>>> 4604933             "bucket": "noaa-nexrad-l2",
>>>>>>> 4604934             "time": "2015-11-04 21:29:27.346509Z",
>>>>>>> 4604935             "time_local": "2015-11-04 15:29:27.346509",
>>>>>>> 4604936             "remote_addr": "",
>>>>>>> 4604937             "object_owner":
>>>>>>> "b05f707271774dbd89674a0736c9406e",
>>>>>>> 4604938             "user": "b05f707271774dbd89674a0736c9406e",
>>>>>>> 4604939             "operation": "PUT",
>>>>>>
>>>>>> I'd expect a multipart upload completion to be done with a POST, not a
>>>>>> PUT.
>>>>>
>>>>> Indeed it seems really weird.
>>>>>>
>>>>>>
>>>>>>> 4604940             "uri":
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> "\/noaa-nexrad-l2\/2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_20150101110000_20150101115959.tar",
>>>>>>> 4604941             "http_status": "200",
>>>>>>> 4604942             "error_code": "",
>>>>>>> 4604943             "bytes_sent": 19,
>>>>>>> 4604944             "bytes_received": 0,
>>>>>>> 4604945             "object_size": 0,
>>>>>>
>>>>>> Do you see a zero object_size for other multipart uploads?
>>>>>
>>>>> I think so. I still don't know how to tell for certain if a radosgw
>>>>> object
>>>>> is a multipart object or not. I think all of the objects in
>>>>> noaa-nexrad-l2
>>>>> bucket are multipart::
>>>>>
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out-        {
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bucket":
>>>>> "noaa-nexrad-l2",
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time":
>>>>> "2015-10-16
>>>>> 19:49:30.579738Z",
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time_local":
>>>>> "2015-10-16 14:49:30.579738",
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "remote_addr": "",
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user":
>>>>> "b05f707271774dbd89674a0736c9406e",
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out: "operation":
>>>>> "POST",
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "uri":
>>>>>
>>>>>
>>>>> "\/noaa-nexrad-l2\/2015\/01\/13\/KGRK\/NWS_NEXRAD_NXL2DP_KGRK_20150113040000_20150113045959.tar",
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "http_status":
>>>>> "200",
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "error_code": "",
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_sent": 331,
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_received":
>>>>> 152,
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "object_size": 0,
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "total_time": 0,
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user_agent":
>>>>> "Boto\/2.38.0 Python\/2.7.7 Linux\/2.6.32-573.7.1.el6.x86_64",
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "referrer": ""
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out-        }
>>>>>
>>>>> The objects above
>>>>> (NWS_NEXRAD_NXL2DP_KGRK_20150113040000_20150113045959.tar)
>>>>> pulls down without an issue though. Below is a paste for object
>>>>> "NWS_NEXRAD_NXL2DP_KVBX_20150225160000_20150225165959.tar" which
>>>>> 404's::
>>>>> http://pastebin.com/Jtw8z7G4
>>>>
>>>> Sadly the log doesn't provide all the input, but I can guess what the
>>>> operations were:
>>>>
>>>>    - POST (init multipart upload)
>>>>    - PUT (upload part)
>>>>    - GET (list parts)
>>>>    - POST (complete multipart) <-- took > 57 seconds to process
>>>>    - POST (complete multipart)
>>>>    - HEAD (stat object)
>>>>
>>>> For some reason the complete multipart operation took too long, which
>>>> I think triggered a client retry (either that, or an abort). Then
>>>> there were two completions racing (or a complete and abort), which
>>>> might have caused the issue we're seeing for some reason. E.g., two
>>>> completions might have ended up with the second completion noticing
>>>> that it's overwriting an existing object (that we just created),
>>>> sending the 'old' object to be garbage collected, when that object's
>>>> tail is actually its own tail.
>>>>
>>>>
>>>>> I see two posts one recorded a minute before for this object both with
>>>>> 0
>>>>> size though. Does this help at all?
>>>>
>>>> Yes, very much
>>>>
>>>> Thanks,
>>>> Yehuda
>>>
>>>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux