RE: ContextQueue vs op_wq race?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 17 Oct 2018, Ma, Jianpeng wrote:
> Hi Sage, xingguo:
> 
> Yes there is a race, see the following code:
> 
> 
> 		ThreadA												ThreadB
>   sdata->shard_lock.Lock();                                                                                              
>   if (sdata->pqueue->empty() &&                                                                                          
>      !(is_smallest_thread_index && !sdata->context_queue.empty())) {                                                     
>   
> 												 void queue(list<Context *>& ls) {
>   											 		bool empty = false;
>   										  			{   
>      	 													std::scoped_lock l(q_mutex);
>       														if (q.empty()) {
>      									   					  q.swap(ls);
>         													  empty = true;
>       														} else {
>         													  q.insert(q.end(), ls.begin(), ls.end());
>       														}   
>     													 }   
> 
>     													if (empty) {
>      														 mutex.Lock();
>      														 cond.Signal();
>       														 mutex.Unlock();
>   													 } 
> 												}  
> 
> 
> 
> 
> 
>      sdata->sdata_wait_lock.Lock();                                                                                    
>     if (!sdata->stop_waiting) {                                                                                          
> 
> 
> I modified the code as following and think this can fix the above race.
> 
>   sdata->shard_lock.Lock();                                                                                              
>   if (is_smallest_thread_index && sdata->pqueue->empty()) {                                                              
>     sdata->sdata_wait_lock.Lock();   //check context_queue must be locked by sdata_wait.
>     if (sdata->context_queue.empty()) {

This looks right to me!

>       if (!sdata->stop_watiting) {
>         dout(20) << __func__ << " empty q, waiting" << dendl;                                                            
>         osd->cct->get_heartbeat_map()->clear_timeout(hb);                                                                
>         sdata->shard_lock.Unlock();                                                                                      
>         sdata->sdata_cond.Wait(sdata->sdata_wait_lock);                                                                  
>         sdata->sdata_wait_lock.Unlock();                                                                                 
>         sdata->shard_lock.Lock();                                                                                        
>         if (sdata->pqueue->empty() && sdata->context_queue.empty()) {                                        
>           sdata->shard_lock.Unlock();                                                                                    
>           return;                                                                                                        
>         }                                                                                                                
>         osd->cct->get_heartbeat_map()->reset_timeout(hb,                                                                 
>             osd->cct->_conf->threadpool_default_timeout, 0);                                                             
>       } else {
>         dout(20) << __func__ << " need return immediately" << dendl;
>         sdata->sdata_wait_lock.Unlock();
>         sdata->shard_lock.Unlock();
>         return;
>       }
>     } else {                                                                                                             
>         sdata->sdata_wait_lock.Unlock();
>     }
> 
>   } else if (sdata->pqueue->empty()) {
>     ;
>   }
> 
> BTY: why not remove sdata_wait_lock ?  i think it safe to remove sdata_wait_lock. Or am I missing something?

The mutex is what closes the race between the ContextQueue::queue 
and the worker thread.  It ensures that we can verify the 
context_queue is empty and then wait without queue() adding an item 
between those two steps...

sage


> 
> Jianpeng
> Thanks!
> 
> 
> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Ma, Jianpeng
> Sent: Wednesday, October 17, 2018 9:59 AM
> To: Sage Weil <sweil@xxxxxxxxxx>
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Subject: RE: ContextQueue vs op_wq race?
> 
> Ok, I'll check this now.
> 
> Thanks!
> Jianpeng
> 
> -----Original Message-----
> From: Sage Weil [mailto:sweil@xxxxxxxxxx] 
> Sent: Wednesday, October 17, 2018 6:55 AM
> To: Ma, Jianpeng <jianpeng.ma@xxxxxxxxx>
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Subject: ContextQueue vs op_wq race?
> 
> Hi Jianpeng,
> 
> Can you take a look at http://tracker.ceph.com/issues/36473 and see if you have any ideas?  Does that possible race make sense?
> 
> Thanks!
> sage
> 
> 



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux