Hi ceph-devel, In our ceph cluster (with rgw), we came across a problem that all rgw process are stuck (all worker threads wait for the response from OSD, and start giving 500 to clients). objecter_requests dump showed the slow in flight requests were caused by one OSD, which has 2 PGs doing backfilling and it has 2 bucket index objects. At OSD side, we configure 8 threads, it turned out when this problem occurred, several op threads took seconds (even tens of seconds) handling bucket index op, with most of time waiting for the ondisk_read_lock. As a result, the throughput of the op threads drop (qlen increasing). I am wondering what options we can pursue to improve the situation, some general ideas on my mind: 1> Similar to OpContext::rwstate, instead of make the op thread stuck, put this op to a waiting list and notify upon lock available. I am not sure if this worth it or break anything. 2> Differentiate the service class at filestore level for such OP - somebody is waiting for its release of the lock. Does this break any assumption at filestore layer? As we are using EC (8+3), the fan out is more than replication pool, such kind of slow from one OSD could be cascading to more OSDs easier. BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739 Look forward to your suggestions. Thanks, Guang -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html