Re: RGW - Can't download complete object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The code has been backported and should be part of the firefly 0.80.10 release and the hammer 0.94.2 release.

Nathan

On 05/14/2015 07:30 AM, Yehuda Sadeh-Weinraub wrote:
The code is in wip-11620, abd it's currently on top of the next branch. We'll get it through the tests, then get it into hammer and firefly. I wouldn't recommend installing it in production without proper testing first.

Yehuda

----- Original Message -----
From: "Sean Sullivan" <seapasulli@xxxxxxxxxxxx>
To: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Sent: Wednesday, May 13, 2015 7:22:10 PM
Subject: Re:  RGW - Can't download complete object

Thank you so much Yahuda! I look forward to testing these. Is there a way
for me to pull this code in? Is it in master?


On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:

Ok, I dug a bit more, and it seems to me that the problem is with the
manifest that was created. I was able to reproduce a similar issue (opened
ceph bug #11622), for which I also have a fix.

I created new tests to cover this issue, and we'll get those recent fixes
as soon as we can, after we test for any regressions.

Thanks,
Yehuda

----- Original Message -----
From: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx>
To: "Sean Sullivan" <seapasulli@xxxxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Sent: Wednesday, May 13, 2015 2:33:07 PM
Subject: Re:  RGW - Can't download complete object

That's another interesting issue. Note that for part 12_80 the manifest
specifies (I assume, by the messenger log) this part:


default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
(note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')

whereas it seems that you do have the original part:

default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80
(note the '2/...')

The part that the manifest specifies does not exist, which makes me think
that there is some weird upload sequence, something like:

  - client uploads part, upload finishes but client does not get ack for
  it
  - client retries (second upload)
  - client gets ack for the first upload and gives up on the second one

But I'm not sure if it would explain the manifest, I'll need to take a
look
at the code. Could such a sequence happen with the client that you're
using
to upload?

Yehuda

----- Original Message -----
From: "Sean Sullivan" <seapasulli@xxxxxxxxxxxx>
To: "Yehuda Sadeh-Weinraub" <yehuda@xxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Sent: Wednesday, May 13, 2015 2:07:22 PM
Subject: Re:  RGW - Can't download complete object

Sorry for the delay. It took me a while to figure out how to do a range
request and append the data to a single file. The good news is that the
end
file seems to be 14G in size which matches the files manifest size. The
bad
news is that the file is completely corrupt and the radosgw log has
errors.
I am using the following code to perform the download::


https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py

Here is a clip of the log file::
--
2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/1033338
<==
osd.11 10.64.64.101:6809/942707 5 ==== osd_op_reply(74566287

default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12
[read 0~858004] v0'0 uv41308 ondisk = 0) v6 ==== 304+0+858004
(1180387808 0
2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=12934184960 len=858004
2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/1033338
<==
osd.45 10.64.64.101:6845/944590 2 ==== osd_op_reply(74566142

default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
[read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6
====
302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=12145655808 len=4194304

2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error
when
trying to read object: -2

2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/1033338
<==
osd.21 10.64.64.102:6856/1133473 16 ==== osd_op_reply(74566144

default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12
[read 0~3671316] v0'0 uv41395 ondisk = 0) v6 ==== 304+0+3671316
(1695485150
0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=10786701312 len=3671316
2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/1033338
<==
osd.82 10.64.64.103:6857/88524 2 ==== osd_op_reply(74566283

default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8
[read 0~4194304] v0'0 uv41566 ondisk = 0) v6 ==== 303+0+4194304
(1474509283
0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420
2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=12917407744 len=4194304

I couldn't really find any good documentation on how fragments/files
are
layed out on the object file system so I am not sure on where the file
will
be. How could the 4mb object have issues but the cluster be completely
health okay? I did do the rados stat of each object inside ceph and
they
all
appear to be there::

http://paste.ubuntu.com/11118561/

The sum of all of the objects :: 14584887282
The stat of the object inside ceph:: 14577056082

So for some reason I have more data in objects than the key manifest.
We
easiliy identified this object via the same method as the other thread
I
have::

for key in keys:
    ....:     if ( key.name ==

'b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam'
    ):
    ....:         implicit = key.size
    ....:         explicit =
    conn.get_bucket(bucket).get_key(key.name).size
    ....:         absolute = abs(implicit - explicit)
    ....:         print key.name
    ....:         print implicit
    ....:         print explicit
    ....:

b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam
14578628946
14577056082

So it looks like I have 3 different sizes. I figure this may be the
network
issue that was mentioned in the other thread but seeing as this is not
the
first 512k and the overalll size still matches as well as the errors I
am
seeing in the gateway I feel that this may be a bigger issue.

Has anyone seen this before?  The only mention of the "got unexpected
error
when trying to read object" is here
(http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-May/021688.html)
but my google skills are pretty poor.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux