Re: Bucket index op - lock contention hang op threads

Samuel Just <sam.just@xxxxxxxxxxx> · Thu, 5 Feb 2015 13:42:13 -0800



Sure, a81f3e6e61abfc7eca7743a83bf4af810705b449.  The intention was
actually to allow writes on degraded objects for replicated pools (to
avoid a 4k rbd write blocking on a 4mb recovery), but I think it
solves this issue as well.
-Sam

On Thu, Feb 5, 2015 at 1:39 PM, GuangYang <yguang11@xxxxxxxxxxx> wrote:
> Thanks Sam! Do you mind sharing the pull request / commit Id of the change?
>
>> Date: Thu, 5 Feb 2015 11:52:04 -0800
>> Subject: Re: Bucket index op - lock contention hang op threads
>> From: sam.just@xxxxxxxxxxx
>> To: yguang11@xxxxxxxxxxx
>> CC: ceph-devel@xxxxxxxxxxxxxxx; sweil@xxxxxxxxxx
>
>>
>> Recent changes already merged for hammer should prevent blocking the
>> thread on the ondisk_read_lock by expanding the ObjectContext::rwstate
>> lists mostly as you suggested.
>> -Sam
>>
>> On Thu, Feb 5, 2015 at 1:36 AM, GuangYang <yguang11@xxxxxxxxxxx> wrote:
>> > Hi ceph-devel,
>> > In our ceph cluster (with rgw), we came across a problem that all rgw
>> > process are stuck (all worker threads wait for the response from OSD, and
>> > start giving 500 to clients). objecter_requests dump showed the slow in
>> > flight requests were caused by one OSD, which has 2 PGs doing backfilling
>> > and it has 2 bucket index objects.
>> >
>> > At OSD side, we configure 8 threads, it turned out when this problem
>> > occurred, several op threads took seconds (even tens of seconds) handling
>> > bucket index op, with most of time waiting for the ondisk_read_lock. As a
>> > result, the throughput of the op threads drop (qlen increasing).
>> >
>> > I am wondering what options we can pursue to improve the situation, some
>> > general ideas on my mind:
>> > 1> Similar to OpContext::rwstate, instead of make the op thread stuck,
>> > put this op to a waiting list and notify upon lock available. I am not sure
>> > if this worth it or break anything.
>> > 2> Differentiate the service class at filestore level for such OP -
>> > somebody is waiting for its release of the lock. Does this break any
>> > assumption at filestore layer?
>> >
>> > As we are using EC (8+3), the fan out is more than replication pool,
>> > such kind of slow from one OSD could be cascading to more OSDs easier.
>> >
>> > BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739
>> >
>> > Look forward to your suggestions.
>> >
>> > Thanks,
>> > Guang --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@xxxxxxxxxxxxxxx
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html