Re: Race condition between "read CFQ stats" and "block device shutdown"

Anatol Pomozov <anatol.pomozov@xxxxxxxxx> · Wed, 25 Sep 2013 13:37:51 -0700

Hi

On Wed, Sep 4, 2013 at 9:07 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello,
>
> On Wed, Sep 04, 2013 at 08:45:33AM -0700, Anatol Pomozov wrote:
>> I am not an expect in block code, so I have a few questions here:
>>
>> - are we sure that this operation is atomic? What if blkg->q becomes
>> dead right after we checked it, and blkg->q->queue_lock got invalid so
>> we have the same crash as before?
>
> request_queue lock switching is something inherently broken in block
> layer.  It's unsalvageable.

Fully agree. The problem that request_queue->queue_lock is a shared
resource that concurrently modified/accessed. In this case (when one
thread changes, another thread access it) we need synchronization to
prevent race conditions. So we need a spin_lock to access queue_lock
spin_lock, otherwise we have a crash like one above...

>  Maybe we can drop lock switching once blk-mq is fully merged.

Could you please provide more information about it? What is the timeline?

If there is an easy way to fix the race condition I would like to
help. Please give me some pointer what direction I should move.

PS Just a little bit of context why I care about this bug. We test a
large farm that actively uses iscsi. We are going to have a lot of
iscsi device startup/shutdown. I am testing whether this codepath has
race conditions and I found one above.
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html