RE: Bucket index op - lock contention hang op threads

GuangYang <yguang11@xxxxxxxxxxx> · Fri, 6 Feb 2015 00:53:28 +0000

Thanks Sam! Just took a look at the patch, it should be very much helpful for our use case.

Thanks,
Guang

----------------------------------------
> Date: Thu, 5 Feb 2015 13:42:13 -0800
> Subject: Re: Bucket index op - lock contention hang op threads
> From: sam.just@xxxxxxxxxxx
> To: yguang11@xxxxxxxxxxx
> CC: ceph-devel@xxxxxxxxxxxxxxx; sweil@xxxxxxxxxx
>
> Sure, a81f3e6e61abfc7eca7743a83bf4af810705b449. The intention was
> actually to allow writes on degraded objects for replicated pools (to
> avoid a 4k rbd write blocking on a 4mb recovery), but I think it
> solves this issue as well.
> -Sam
>
> On Thu, Feb 5, 2015 at 1:39 PM, GuangYang <yguang11@xxxxxxxxxxx> wrote:
>> Thanks Sam! Do you mind sharing the pull request / commit Id of the change?
>>
>>> Date: Thu, 5 Feb 2015 11:52:04 -0800
>>> Subject: Re: Bucket index op - lock contention hang op threads
>>> From: sam.just@xxxxxxxxxxx
>>> To: yguang11@xxxxxxxxxxx
>>> CC: ceph-devel@xxxxxxxxxxxxxxx; sweil@xxxxxxxxxx
>>
>>>
>>> Recent changes already merged for hammer should prevent blocking the
>>> thread on the ondisk_read_lock by expanding the ObjectContext::rwstate
>>> lists mostly as you suggested.
>>> -Sam
>>>
>>> On Thu, Feb 5, 2015 at 1:36 AM, GuangYang <yguang11@xxxxxxxxxxx> wrote:
>>>> Hi ceph-devel,
>>>> In our ceph cluster (with rgw), we came across a problem that all rgw
>>>> process are stuck (all worker threads wait for the response from OSD, and
>>>> start giving 500 to clients). objecter_requests dump showed the slow in
>>>> flight requests were caused by one OSD, which has 2 PGs doing backfilling
>>>> and it has 2 bucket index objects.
>>>>
>>>> At OSD side, we configure 8 threads, it turned out when this problem
>>>> occurred, several op threads took seconds (even tens of seconds) handling
>>>> bucket index op, with most of time waiting for the ondisk_read_lock. As a
>>>> result, the throughput of the op threads drop (qlen increasing).
>>>>
>>>> I am wondering what options we can pursue to improve the situation, some
>>>> general ideas on my mind:
>>>> 1> Similar to OpContext::rwstate, instead of make the op thread stuck,
>>>> put this op to a waiting list and notify upon lock available. I am not sure
>>>> if this worth it or break anything.
>>>> 2> Differentiate the service class at filestore level for such OP -
>>>> somebody is waiting for its release of the lock. Does this break any
>>>> assumption at filestore layer?
>>>>
>>>> As we are using EC (8+3), the fan out is more than replication pool,
>>>> such kind of slow from one OSD could be cascading to more OSDs easier.
>>>>
>>>> BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739
>>>>
>>>> Look forward to your suggestions.
>>>>
>>>> Thanks,
>>>> Guang --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
 		 	   		  ?韬{.n?????%??檩??w?{.n????u朕?Ф?塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f