Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

Kiyoshi Ueda <k-ueda@xxxxxxxxxxxxx> · Fri, 01 Feb 2008 12:39:27 -0500 (EST)

Hi Boris,

On Fri, 1 Feb 2008 08:51:17 +0100, Borislav Petkov wrote:
> > > > The below fix should be enough. It's perfectly legal to have leftover
> > > > byte counts when the drive signals completion, happens all the time for
> > > > eg user issued commands where you don't know an exact byte count.
> > > 
> > > Actually, this behavior has been the case even before
> > > the __blk_end_request() changes.
> > > I did test plain 2.6.24 with the following
> > > 
> > > 
> > > --- linux-2.6/drivers/ide/ide-cd.c	2008-01-31 22:18:59.000000000 +0100
> > > +++ linux-2.6/drivers/ide/ide-cd.c-new	2008-01-31 22:18:50.000000000 +0100
> > > @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr(
> > >  	/*
> > >  	 * If DRQ is clear, the command has completed.
> > >  	 */
> > > -	if ((stat & DRQ_STAT) == 0)
> > > +	if ((stat & DRQ_STAT) == 0) {
> > > +		blk_dump_rq_flags(rq, "ide-cd: rq still having bio");
> > > +		printk("backup: data_len=%u  bi_size=%u\n",
> > > +				rq->data_len, rq->bio->bi_size);
> > >  		goto end_request;
> > > +	}
> > >  
> > >  	/*
> > >  	 * check which way to transfer data
> > > 
> > > 
> > > to see whether we've been getting residual byte counts:
> > > 
> > > Jan 31 22:10:06 gollum kernel: [   26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> > > Jan 31 22:10:06 gollum kernel: [   26.702945]
> > > Jan 31 22:10:06 gollum kernel: [   26.702946] sector 2673511, nr/cnr 0/0
> > > Jan 31 22:10:06 gollum kernel: [   26.703052] bio dfa8ec40, biotail dfa8ec40, buffer 00000000, data 00000000, len 158
> > > Jan 31 22:10:06 gollum kernel: [   26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > Jan 31 22:10:06 gollum kernel: [   26.703877] backup: data_len=158  bi_size=158
> > > 
> > > ... so we've been simply silently ignoring this until now so
> > > i guess we don't need to BUG() for something that's totally benign.
> 
> Hi Kiyoshi,
>  
> > end_that_request_last() is not called when __blk_end_reuqest()
> > returns 1.  Then, the issuer isn't waken up.
> > So I think the BUG() or error messages should be there.
> 
> you mean, end_that_request_last() isn't called when __end_that_request_first()
> returns an error and this is the case only for fs and pc requests.
> Otherwise it _is_ called, thus simulating somewhat the previous behavior.
> However, we never BUG()'ged on residual byte counts before and
> this driver has been in the kernel tree for ages, so what puzzles
> me now is how is BUG()'ing here better than before and shouldn't we
> simply issue a warning instead of killing the interrupt handler...

The Jens' patch passes the residual byte counts to __blk_end_request(),
so __end_that_reqeust_first() should never return 1 and we should never
BUG() on the residual byte counts, unless inconsistency happens such as
the size of remaining bios is bigger than the residual byte counts.

So if __blk_end_request() returns 1 even with the Jens' patch,
it means that the block layer or the driver really have a bug.
And then, the request and the bios could leak or the issuer
would wait forever because end_that_request_last() isn't called.

The previous behavior might ignore such inconsistency and leak only
the bios because it was calling end_that_request_last() anyway.
I would like to BUG() in such cases personally, but I don't object
strongly if you prefer not to BUG().

Thanks,
Kiyoshi Ueda
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html