sorry forget cc ceph-devel. -----Original Message----- From: Ma, Jianpeng Sent: Wednesday, October 17, 2018 11:16 AM To: Sage Weil <sweil@xxxxxxxxxx>; xie.xingguo@xxxxxxxxxx Cc: Ma, Jianpeng <jianpeng.ma@xxxxxxxxx> Subject: RE: ContextQueue vs op_wq race? Hi Sage, xingguo: Yes there is a race, see the following code: ThreadA ThreadB sdata->shard_lock.Lock(); if (sdata->pqueue->empty() && !(is_smallest_thread_index && !sdata->context_queue.empty())) { void queue(list<Context *>& ls) { bool empty = false; { std::scoped_lock l(q_mutex); if (q.empty()) { q.swap(ls); empty = true; } else { q.insert(q.end(), ls.begin(), ls.end()); } } if (empty) { mutex.Lock(); cond.Signal(); mutex.Unlock(); } } sdata->sdata_wait_lock.Lock(); if (!sdata->stop_waiting) { I modified the code as following and think this can fix the above race. sdata->shard_lock.Lock(); if (is_smallest_thread_index && sdata->pqueue->empty()) { sdata->sdata_wait_lock.Lock(); //check context_queue must be locked by sdata_wait. if (sdata->context_queue.empty()) { if (!sdata->stop_watiting) { dout(20) << __func__ << " empty q, waiting" << dendl; osd->cct->get_heartbeat_map()->clear_timeout(hb); sdata->shard_lock.Unlock(); sdata->sdata_cond.Wait(sdata->sdata_wait_lock); sdata->sdata_wait_lock.Unlock(); sdata->shard_lock.Lock(); if (sdata->pqueue->empty() && sdata->context_queue.empty()) { sdata->shard_lock.Unlock(); return; } osd->cct->get_heartbeat_map()->reset_timeout(hb, osd->cct->_conf->threadpool_default_timeout, 0); } else { dout(20) << __func__ << " need return immediately" << dendl; sdata->sdata_wait_lock.Unlock(); sdata->shard_lock.Unlock(); return; } } else { sdata->sdata_wait_lock.Unlock(); } } else if (sdata->pqueue->empty()) { ; } BTY: why not remove sdata_wait_lock ? i think it safe to remove sdata_wait_lock. Or am I missing something? Jianpeng Thanks! -----Original Message----- From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Ma, Jianpeng Sent: Wednesday, October 17, 2018 9:59 AM To: Sage Weil <sweil@xxxxxxxxxx> Cc: ceph-devel@xxxxxxxxxxxxxxx Subject: RE: ContextQueue vs op_wq race? Ok, I'll check this now. Thanks! Jianpeng -----Original Message----- From: Sage Weil [mailto:sweil@xxxxxxxxxx] Sent: Wednesday, October 17, 2018 6:55 AM To: Ma, Jianpeng <jianpeng.ma@xxxxxxxxx> Cc: ceph-devel@xxxxxxxxxxxxxxx Subject: ContextQueue vs op_wq race? Hi Jianpeng, Can you take a look at http://tracker.ceph.com/issues/36473 and see if you have any ideas? Does that possible race make sense? Thanks! sage