Re: 2.6.20.3 AMD64 oops in CFQ code

Neil Brown <neilb@xxxxxxx> · Fri, 23 Mar 2007 11:44:58 +1100

On Thursday March 22, dan.j.williams@xxxxxxxxx wrote:
> 
> Not a cfq failure, but I have been able to reproduce a different oops
> at array stop time while i/o's were pending.  I have not dug into it
> enough to suggest a patch, but I wonder if it is somehow related to
> the cfq failure since it involves congestion and drives going away:

Thanks.   I know about that one and have a patch about to be posted
which should fix it.  But I don't completely understand it.

When a raid5 array shuts down, it clears mddev->private, but doesn't
clean q->backing_dev_info.congested_fn.  So if someone tries to call
that congested_fn, it will try to dereference mddev->private and Oops.

Only by the time that raid5 is shutting down, no-one should have a
reference to the device any more, and no-one should be in a position
to call congested_fn !!

Maybe pdflush is just trying to sync the block device, even though
there is no dirty date .... dunno....

But I don't think it is related to the cfq problem as this one is only
a problem when the array is being stopped.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html