Re: [Bug] double ->queue_rq() because of timeout in ->queue_rq()

Ming Lei <ming.lei@xxxxxxxxxx> · Sat, 22 Oct 2022 12:27:22 +0800

On Fri, Oct 21, 2022 at 02:33:21PM -0400, David Jeffery wrote:
> On Fri, Oct 21, 2022 at 11:22 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
> >
> > On Fri, Oct 21, 2022 at 08:32:31AM -0600, Keith Busch wrote:
> > >
> > > I agree with your idea that this is a lower level driver responsibility:
> > > it should reclaim all started requests before allowing new queuing.
> > > Perhaps the block layer should also raise a clear warning if it's
> > > queueing a request that's already started.
> >
> > The thing is that it is one generic issue, lots of VM drivers could be
> > affected, and it may not be easy for drivers to handle the race too.
> >
> 
> While virtual systems are a common source of the problem, fully
> preempt kernels (with or without real-time patches) can also trigger
> this condition rather simply with a poorly behaved real-time task. The
> involuntary preemption means the queue_rq call can be stopped to let
> another task run. Poorly behaving tasks claiming the CPU for longer
> than the request timeout when preempting a task in a queue_rq function
> could cause the condition on real or virtual hardware. So it's not
> just VM related drivers that are affected by the race.

In theory, yes. But ->queue_rq() is in rcu read critical area, and
usually CONFIG_RCU_BOOST is enabled for covering this problem otherwise
OOM can be triggered easily too.

I guess it is hard to trigger it in real hardware with preempt kernel.

Thanks,
Ming

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization