On 09/22/2011 06:20 PM, Thadeu Lima de Souza Cascardo wrote: > On Thu, Sep 22, 2011 at 11:16:30AM -0400, Alan Stern wrote: >> Rocko: >> >> Can you try testing this patch instead of all the patches I sent to >> you (but keep Ted's patch)? >> >> Alan Stern >> >> On Thu, 22 Sep 2011, Hannes Reinecke wrote: >> >>> On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote: >>>> On 09/19/11 08:00, Ben Hutchings wrote: >>> [ .. ] >>>>> >>>>> There have been reports of this in Debian going back to 2.6.39: >>>>> >>>>> http://bugs.debian.org/631187 >>>>> http://bugs.debian.org/636263 >>>>> http://bugs.debian.org/642043 >>>>> >>>>> Plus possibly related crashes in elv_put_request after CD-ROM removal: >>>>> >>>>> http://bugs.debian.org/633890 >>>>> http://bugs.debian.org/634681 >>>>> http://bugs.debian.org/636103 >>>>> >>>>> The former was also reported in Ubuntu since their 2.6.38-10: >>>>> >>>>> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796 >>>>> >>>>> The result of the discussion there was that it appeared to be a >>>>> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b >>>>> ("[SCSI] put stricter guards on queue dead checks") which was also >>>>> included in a stable update for 2.6.38. >>>>> >>>>> There was also a report on bugzilla.kernel.org, though no-one can see >>>>> quite what that says now: >>>>> >>>>> https://bugzilla.kernel.org/show_bug.cgi?id=38842 >>>>> >>>>> I also reported most of the above to James Bottomley and linux-scsi >>>>> nearly 2 months ago, to no response. >>>> >>>> I've reported a similar oops related to the above commit: >>>> [BUG] Oops when SCSI device under multipath is removed >>>> https://lkml.org/lkml/2011/8/10/11 >>>> >>>> Elevator being removed is the core of the problem. >>>> And the essential issue seems 2 different models of queue/driver relation >>>> implied by queue_lock. >>>> >>>> If reverting the commit is not an option, >>>> until somebody comes up to fix the essential issue, >>>> the patch below should close the regressions introduced by the commit. >>>> >>> Why do you have to do it that complicated? >>> Couldn't we just state that any external lock is being disconnected from >>> queue_lock after blk_cleanup_queue()? >>> >>> Then something like this should suffice here: >> >> >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 90e1ffd..a4ac005 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q) >> queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); >> mutex_unlock(&q->sysfs_lock); >> >> - if (q->elevator) >> - elevator_exit(q->elevator); >> - >> - blk_throtl_exit(q); >> + if (q->queue_lock != q->__queue_lock) >> + q->queue_lock = q->__queue_lock; > > That should be &q->__queue_lock. > Why, but of course. It's been fixed with the official patch (cf block: Free queue resources at blk_release_queue()) Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html