On Fri, Oct 21, 2022 at 02:33:21PM -0400, David Jeffery wrote: > On Fri, Oct 21, 2022 at 11:22 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > On Fri, Oct 21, 2022 at 08:32:31AM -0600, Keith Busch wrote: > > > > > > I agree with your idea that this is a lower level driver responsibility: > > > it should reclaim all started requests before allowing new queuing. > > > Perhaps the block layer should also raise a clear warning if it's > > > queueing a request that's already started. > > > > The thing is that it is one generic issue, lots of VM drivers could be > > affected, and it may not be easy for drivers to handle the race too. > > > > While virtual systems are a common source of the problem, fully > preempt kernels (with or without real-time patches) can also trigger > this condition rather simply with a poorly behaved real-time task. The > involuntary preemption means the queue_rq call can be stopped to let > another task run. Poorly behaving tasks claiming the CPU for longer > than the request timeout when preempting a task in a queue_rq function > could cause the condition on real or virtual hardware. So it's not > just VM related drivers that are affected by the race. In theory, yes. But ->queue_rq() is in rcu read critical area, and usually CONFIG_RCU_BOOST is enabled for covering this problem otherwise OOM can be triggered easily too. I guess it is hard to trigger it in real hardware with preempt kernel. Thanks, Ming