Re: RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm working on getting the code they used and trying different timeouts in my multipart upload code. Right now I have not created any new 404 keys though :-(

On 1/20/16 3:44 PM, Yehuda Sadeh-Weinraub wrote:
We'll need to confirm that this is the actual issue, and then have it
fixed. It would be nice to have some kind of a unitest that reproduces
it.

Yehuda

On Wed, Jan 20, 2016 at 1:34 PM, seapasulli@xxxxxxxxxxxx
<seapasulli@xxxxxxxxxxxx> wrote:
So is there any way to prevent this from happening going forward? I mean
ideally this should never be possible, right? Even with a complete object
that is 0 bytes it should be downloaded as 0 bytes and have a different
md5sum and not report as 7mb?



On 1/20/16 1:30 PM, Yehuda Sadeh-Weinraub wrote:
On Wed, Jan 20, 2016 at 10:43 AM, seapasulli@xxxxxxxxxxxx
<seapasulli@xxxxxxxxxxxx> wrote:

On 1/19/16 4:00 PM, Yehuda Sadeh-Weinraub wrote:
On Fri, Jan 15, 2016 at 5:04 PM, seapasulli@xxxxxxxxxxxx
<seapasulli@xxxxxxxxxxxx> wrote:
I have looked all over and I do not see any explicit mention of
"NWS_NEXRAD_NXL2DP_PAKC_20150101110000_20150101115959" in the logs nor
do
I
see a timestamp from November 4th although I do see log rotations
dating
back to october 15th. I don't think it's possible it wasn't logged so I
am
going through the bucket logs from the 'radosgw-admin log show
--object'
side and I found the following::

4604932         {
4604933             "bucket": "noaa-nexrad-l2",
4604934             "time": "2015-11-04 21:29:27.346509Z",
4604935             "time_local": "2015-11-04 15:29:27.346509",
4604936             "remote_addr": "",
4604937             "object_owner": "b05f707271774dbd89674a0736c9406e",
4604938             "user": "b05f707271774dbd89674a0736c9406e",
4604939             "operation": "PUT",
I'd expect a multipart upload completion to be done with a POST, not a
PUT.
Indeed it seems really weird.

4604940             "uri":


"\/noaa-nexrad-l2\/2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_20150101110000_20150101115959.tar",
4604941             "http_status": "200",
4604942             "error_code": "",
4604943             "bytes_sent": 19,
4604944             "bytes_received": 0,
4604945             "object_size": 0,
Do you see a zero object_size for other multipart uploads?
I think so. I still don't know how to tell for certain if a radosgw
object
is a multipart object or not. I think all of the objects in
noaa-nexrad-l2
bucket are multipart::

./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out-        {
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bucket":
"noaa-nexrad-l2",
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time": "2015-10-16
19:49:30.579738Z",
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time_local":
"2015-10-16 14:49:30.579738",
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "remote_addr": "",
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user":
"b05f707271774dbd89674a0736c9406e",
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out: "operation": "POST",
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "uri":

"\/noaa-nexrad-l2\/2015\/01\/13\/KGRK\/NWS_NEXRAD_NXL2DP_KGRK_20150113040000_20150113045959.tar",
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "http_status":
"200",
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "error_code": "",
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_sent": 331,
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_received":
152,
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "object_size": 0,
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "total_time": 0,
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user_agent":
"Boto\/2.38.0 Python\/2.7.7 Linux\/2.6.32-573.7.1.el6.x86_64",
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "referrer": ""
./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out-        }

The objects above
(NWS_NEXRAD_NXL2DP_KGRK_20150113040000_20150113045959.tar)
pulls down without an issue though. Below is a paste for object
"NWS_NEXRAD_NXL2DP_KVBX_20150225160000_20150225165959.tar" which 404's::
http://pastebin.com/Jtw8z7G4
Sadly the log doesn't provide all the input, but I can guess what the
operations were:

   - POST (init multipart upload)
   - PUT (upload part)
   - GET (list parts)
   - POST (complete multipart) <-- took > 57 seconds to process
   - POST (complete multipart)
   - HEAD (stat object)

For some reason the complete multipart operation took too long, which
I think triggered a client retry (either that, or an abort). Then
there were two completions racing (or a complete and abort), which
might have caused the issue we're seeing for some reason. E.g., two
completions might have ended up with the second completion noticing
that it's overwriting an existing object (that we just created),
sending the 'old' object to be garbage collected, when that object's
tail is actually its own tail.


I see two posts one recorded a minute before for this object both with 0
size though. Does this help at all?
Yes, very much

Thanks,
Yehuda


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux