Re: [PATCH 1/2] wait: add wq_has_multiple_sleepers helper

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 11, 2019 at 03:40:06PM +0200, Oleg Nesterov wrote:
> On 07/11, Oleg Nesterov wrote:
> >
> > Jens,
> >
> > I managed to convince myself I understand why 2/2 needs this change...
> > But rq_qos_wait() still looks suspicious to me. Why can't the main loop
> > "break" right after io_schedule()? rq_qos_wake_function() either sets
> > data->got_token = true or it doesn't wakeup the waiter sleeping in
> > io_schedule()
> >
> > This means that data.got_token = F at the 2nd iteration is only possible
> > after a spurious wakeup, right? But in this case we need to set state =
> > TASK_UNINTERRUPTIBLE again to avoid busy-wait looping ?
> 
> Oh. I can be easily wrong, I never read this code before, but it seems to
> me there is another unrelated race.
> 
> rq_qos_wait() can't rely on finish_wait() because it doesn't necessarily
> take wq_head->lock.
> 
> rq_qos_wait() inside the main loop does
> 
> 		if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) {
> 			finish_wait(&rqw->wait, &data.wq);
> 
> 			/*
> 			 * We raced with wbt_wake_function() getting a token,
> 			 * which means we now have two. Put our local token
> 			 * and wake anyone else potentially waiting for one.
> 			 */
> 			if (data.got_token)
> 				cleanup_cb(rqw, private_data);
> 			break;
> 		}
> 
> finish_wait() + "if (data.got_token)" can race with rq_qos_wake_function()
> which does
> 
> 	data->got_token = true;
> 	list_del_init(&curr->entry);
> 

Argh finish_wait() does __set_current_state, well that's shitty.  I guess we
need to do

data->got_token = true;
smp_wmb()
list_del_init(&curr->entry);

and then do

smp_rmb();
if (data.got_token)
	cleanup_cb(rqw, private_data);

to be safe?

> rq_qos_wait() can see these changes out-of-order: finish_wait() can see
> list_empty_careful() == T and avoid wq_head->lock, and in this case the
> code above can see data->got_token = false.
> 
> No?
> 
> and I don't really understand
> 
> 	has_sleeper = false;
> 
> at the end of the main loop. I think it should do "has_sleeper = true",
> we need to execute the code above only once, right after prepare_to_wait().
> But this is harmless.

We want has_sleeper = false because the second time around we just want to grab
the inflight counter.  Yes we should have been worken up by our special thing
and so should already have data.got_token, but that sort of thinking ends in
hung boxes and me having to try to mitigate thousands of boxes suddenly hitting
a case we didn't think was possible.  Thanks,

Josef



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux