Sure, a81f3e6e61abfc7eca7743a83bf4af810705b449. The intention was actually to allow writes on degraded objects for replicated pools (to avoid a 4k rbd write blocking on a 4mb recovery), but I think it solves this issue as well. -Sam On Thu, Feb 5, 2015 at 1:39 PM, GuangYang <yguang11@xxxxxxxxxxx> wrote: > Thanks Sam! Do you mind sharing the pull request / commit Id of the change? > >> Date: Thu, 5 Feb 2015 11:52:04 -0800 >> Subject: Re: Bucket index op - lock contention hang op threads >> From: sam.just@xxxxxxxxxxx >> To: yguang11@xxxxxxxxxxx >> CC: ceph-devel@xxxxxxxxxxxxxxx; sweil@xxxxxxxxxx > >> >> Recent changes already merged for hammer should prevent blocking the >> thread on the ondisk_read_lock by expanding the ObjectContext::rwstate >> lists mostly as you suggested. >> -Sam >> >> On Thu, Feb 5, 2015 at 1:36 AM, GuangYang <yguang11@xxxxxxxxxxx> wrote: >> > Hi ceph-devel, >> > In our ceph cluster (with rgw), we came across a problem that all rgw >> > process are stuck (all worker threads wait for the response from OSD, and >> > start giving 500 to clients). objecter_requests dump showed the slow in >> > flight requests were caused by one OSD, which has 2 PGs doing backfilling >> > and it has 2 bucket index objects. >> > >> > At OSD side, we configure 8 threads, it turned out when this problem >> > occurred, several op threads took seconds (even tens of seconds) handling >> > bucket index op, with most of time waiting for the ondisk_read_lock. As a >> > result, the throughput of the op threads drop (qlen increasing). >> > >> > I am wondering what options we can pursue to improve the situation, some >> > general ideas on my mind: >> > 1> Similar to OpContext::rwstate, instead of make the op thread stuck, >> > put this op to a waiting list and notify upon lock available. I am not sure >> > if this worth it or break anything. >> > 2> Differentiate the service class at filestore level for such OP - >> > somebody is waiting for its release of the lock. Does this break any >> > assumption at filestore layer? >> > >> > As we are using EC (8+3), the fan out is more than replication pool, >> > such kind of slow from one OSD could be cascading to more OSDs easier. >> > >> > BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739 >> > >> > Look forward to your suggestions. >> > >> > Thanks, >> > Guang -- >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > the body of a message to majordomo@xxxxxxxxxxxxxxx >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html