Thanks Sam! Just took a look at the patch, it should be very much helpful for our use case. Thanks, Guang ---------------------------------------- > Date: Thu, 5 Feb 2015 13:42:13 -0800 > Subject: Re: Bucket index op - lock contention hang op threads > From: sam.just@xxxxxxxxxxx > To: yguang11@xxxxxxxxxxx > CC: ceph-devel@xxxxxxxxxxxxxxx; sweil@xxxxxxxxxx > > Sure, a81f3e6e61abfc7eca7743a83bf4af810705b449. The intention was > actually to allow writes on degraded objects for replicated pools (to > avoid a 4k rbd write blocking on a 4mb recovery), but I think it > solves this issue as well. > -Sam > > On Thu, Feb 5, 2015 at 1:39 PM, GuangYang <yguang11@xxxxxxxxxxx> wrote: >> Thanks Sam! Do you mind sharing the pull request / commit Id of the change? >> >>> Date: Thu, 5 Feb 2015 11:52:04 -0800 >>> Subject: Re: Bucket index op - lock contention hang op threads >>> From: sam.just@xxxxxxxxxxx >>> To: yguang11@xxxxxxxxxxx >>> CC: ceph-devel@xxxxxxxxxxxxxxx; sweil@xxxxxxxxxx >> >>> >>> Recent changes already merged for hammer should prevent blocking the >>> thread on the ondisk_read_lock by expanding the ObjectContext::rwstate >>> lists mostly as you suggested. >>> -Sam >>> >>> On Thu, Feb 5, 2015 at 1:36 AM, GuangYang <yguang11@xxxxxxxxxxx> wrote: >>>> Hi ceph-devel, >>>> In our ceph cluster (with rgw), we came across a problem that all rgw >>>> process are stuck (all worker threads wait for the response from OSD, and >>>> start giving 500 to clients). objecter_requests dump showed the slow in >>>> flight requests were caused by one OSD, which has 2 PGs doing backfilling >>>> and it has 2 bucket index objects. >>>> >>>> At OSD side, we configure 8 threads, it turned out when this problem >>>> occurred, several op threads took seconds (even tens of seconds) handling >>>> bucket index op, with most of time waiting for the ondisk_read_lock. As a >>>> result, the throughput of the op threads drop (qlen increasing). >>>> >>>> I am wondering what options we can pursue to improve the situation, some >>>> general ideas on my mind: >>>> 1> Similar to OpContext::rwstate, instead of make the op thread stuck, >>>> put this op to a waiting list and notify upon lock available. I am not sure >>>> if this worth it or break anything. >>>> 2> Differentiate the service class at filestore level for such OP - >>>> somebody is waiting for its release of the lock. Does this break any >>>> assumption at filestore layer? >>>> >>>> As we are using EC (8+3), the fan out is more than replication pool, >>>> such kind of slow from one OSD could be cascading to more OSDs easier. >>>> >>>> BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739 >>>> >>>> Look forward to your suggestions. >>>> >>>> Thanks, >>>> Guang -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html ?韬{.n?????%??檩??w?{.n????u朕?Ф?塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f