Re: RadosGW problems with copy in s3

Sławomir Skowron <szibis@xxxxxxxxx> · Mon, 5 Mar 2012 22:21:44 +0100

On 5 mar 2012, at 19:59, Yehuda Sadeh Weinraub
<yehuda.sadeh@xxxxxxxxxxxxx> wrote:

> On Mon, Mar 5, 2012 at 2:23 AM, Sławomir Skowron
> <slawomir.skowron@xxxxxxxxx> wrote:
>> 2012/3/1 Sławomir Skowron <slawomir.skowron@xxxxxxxxx>:
>>> 2012/2/29 Yehuda Sadeh Weinraub <yehuda.sadeh@xxxxxxxxxxxxx>:
>>>> On Wed, Feb 29, 2012 at 5:06 AM, Sławomir Skowron
>>>> <slawomir.skowron@xxxxxxxxx> wrote:
>>>>>
>>>>> Ok, it's intentional.
>>>>>
>>>>> We are checking meta info about files, then, checking md5 of file
>>>>> content. In parallel, updating object that have change, and then
>>>>> archiving this objects in another key, and last thing is deleting
>>>>> objects that expires.
>>>>>
>>>>> This happens over and over, because, this site is changing many times.
>>>>>
>>>>> Now i don't have any idea, how to workaround this problem, without
>>>>> shutdown this app :(
>>>>
>>>> I looked at your osd log again, and there are other things that don't
>>>> look right. I'll also need you to turn on 'debug osd = 20' and 'debug
>>>> filestore = 20'.
>>>
>>> osd.24 almost 10 minutes of log in debug, as above in attachment.
>>>
>>>> Other than that, I just pushed a workaround that might improve things.
>>>> It's on the wip-rgw-atomic-no-retry branch on github (based on
>>>> 0.42.2), so you might want to give it a spin and let us know whether
>>>> it actually improved things.
>>>
>>> Ok i will try, and let you know soon.
>>
>> Unfortunately, no improvment after upgrade for this version.
>>
> It looks like an issue with updating the bucket index, but I'm having
> trouble confirming it, as the log provided (of osd.24) doesn't contain
> any relevant operations. If you could provide a log from the relevant
> osd it may be very helpful.
>
> You can find the relevant osd by looking at an operation that took too
> long, and look for a request like the following:
>
> 2012-02-28 20:20:10.944859 7fb1affb7700 -- 10.177.64.6:0/1020439 -->
> 10.177.64.4:6839/7954 -- osd_op(client.65007.0:587 .dir.3 [call
> rgw.bucket_prepare_op] 7.ccb26a35) v4 -- ?+0 0xf25270 con 0xbcd1c0
>
> It would be easiest looking for the reply to that request as it will
> contain the osd id (search for a line that contains osd_op_reply and
> the client.65007.0:587 request id).
>
> In the mean time, I created issue #2139 for a probable culprit. Having
> the relevant logs will allow us to verify whether you're hitting that
> or another issue.
>
> Thanks,
> Yehuda

Ok, because of time difference between as i will try too find this on
the morning in job. If there will be insufficient verbosity of logs i
will try too start all OSD in debug, as you write earlier, and then
generate, the problem again.
I try to send logs as soon as possible.

Regards
Slawomir Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html