Re: RadosGW problems with copy in s3

Yehuda Sadeh Weinraub <yehuda.sadeh@xxxxxxxxxxxxx> · Mon, 5 Mar 2012 10:59:31 -0800

On Mon, Mar 5, 2012 at 2:23 AM, Sławomir Skowron
<slawomir.skowron@xxxxxxxxx> wrote:
> 2012/3/1 Sławomir Skowron <slawomir.skowron@xxxxxxxxx>:
>> 2012/2/29 Yehuda Sadeh Weinraub <yehuda.sadeh@xxxxxxxxxxxxx>:
>>> On Wed, Feb 29, 2012 at 5:06 AM, Sławomir Skowron
>>> <slawomir.skowron@xxxxxxxxx> wrote:
>>>>
>>>> Ok, it's intentional.
>>>>
>>>> We are checking meta info about files, then, checking md5 of file
>>>> content. In parallel, updating object that have change, and then
>>>> archiving this objects in another key, and last thing is deleting
>>>> objects that expires.
>>>>
>>>> This happens over and over, because, this site is changing many times.
>>>>
>>>> Now i don't have any idea, how to workaround this problem, without
>>>> shutdown this app :(
>>>
>>> I looked at your osd log again, and there are other things that don't
>>> look right. I'll also need you to turn on 'debug osd = 20' and 'debug
>>> filestore = 20'.
>>
>> osd.24 almost 10 minutes of log in debug, as above in attachment.
>>
>>> Other than that, I just pushed a workaround that might improve things.
>>> It's on the wip-rgw-atomic-no-retry branch on github (based on
>>> 0.42.2), so you might want to give it a spin and let us know whether
>>> it actually improved things.
>>
>> Ok i will try, and let you know soon.
>
> Unfortunately, no improvment after upgrade for this version.
>
It looks like an issue with updating the bucket index, but I'm having
trouble confirming it, as the log provided (of osd.24) doesn't contain
any relevant operations. If you could provide a log from the relevant
osd it may be very helpful.

You can find the relevant osd by looking at an operation that took too
long, and look for a request like the following:

2012-02-28 20:20:10.944859 7fb1affb7700 -- 10.177.64.6:0/1020439 -->
10.177.64.4:6839/7954 -- osd_op(client.65007.0:587 .dir.3 [call
rgw.bucket_prepare_op] 7.ccb26a35) v4 -- ?+0 0xf25270 con 0xbcd1c0

It would be easiest looking for the reply to that request as it will
contain the osd id (search for a line that contains osd_op_reply and
the client.65007.0:587 request id).

In the mean time, I created issue #2139 for a probable culprit. Having
the relevant logs will allow us to verify whether you're hitting that
or another issue.

Thanks,
Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html