Re: [BUG] Unable to handle kernel NULL pointer dereference...as_move_to_dispatch+0x11/0x135

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Fri, 2 Feb 2007 14:12:10 -0800

On Fri, 2 Feb 2007 12:56:30 -0800
Andrew Vasquez <andrew.vasquez@xxxxxxxxxx> wrote:

> On Thu, 01 Feb 2007, Andrew Morton wrote:
> 
> > On Mon, 22 Jan 2007 10:35:10 -0800 Andrew Vasquez <andrew.vasquez@xxxxxxxxxx> wrote:
> > > Basically what is happening from the FC side is the initiator executes
> > > a simple dt test:
> > > 
> > >         dt of=/dev/raw/raw1 procs=8 oncerr=abort bs=16k disable=stats limit=2m passes=1000000 pattern=iot dlimit=2048
> > > 
> > > against a single lun (a very basic Windows target mode driver).
> > > During the test a port-enable, port-disable script is running agains
> > > the switch's port that is connected to the target (this occurs every
> > > sixty seconds (for a disabled duration of 2 seconds).  Additionally,
> > > the target itself is set to LOGO (logout) or drop off the topology
> > > every 30 seconds.
> > 
> > I don't understand what effect the port-enable/port-disable has upon the
> > system.  Will it cause I/O errors, or what?
> 
> No I/O errors should make there way to the upper-layers (block/FS).
> The system *should* be shielded from the fibre-channel fabric events.
> I just wanted to explain what the (basic sanity) test did.
> 
> > > This test runs fine up to 2.6.19.
> > 
> > One thing we did in there was to give direct-io-against-blockdevs some
> > special-case bio-preparation code.  Perhaps this is tickling a bug somehow.
> > 
> > We can revert that change like this:
> > 
> > 
> > diff -puN fs/block_dev.c~a fs/block_dev.c
> > --- a/fs/block_dev.c~a
> > +++ a/fs/block_dev.c
> > @@ -196,8 +196,47 @@ static void blk_unget_page(struct page *
> >  	pvec->page[--pvec->idx] = page;
> >  }
> >  
> > +static int
> > +blkdev_get_blocks(struct inode *inode, sector_t iblock,
> > +		struct buffer_head *bh, int create)
> ...
> 
> Hmm, with this patch we've noted two main differences:
> 
> 1) I/O throughput with the basic 'dd' command used (above) is back to
>    60MB/s, rather than the appalling 20-22 MB/s we were seeing with
>    2.6.20-rcX.
> 
> 2) No panics -- so far with 2+ hours of testing.  With our vanilla
>    system of 2.6.20-rc7, the test could trigger the panic within 15 to
>    20 minutes.
> 
> We'll let this run over the weekend -- I'll certainly let you know if
> anything has changed (failures).

Oh crap, I didn't realise we had a performance regression as well.

direct-io against a blockdev is kinda important for some people, so this is
a must-fix for 2.6.20.  I'll prepare a minimal patch to switch 2.6.20 back
to the 2.6.19 codepaths.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html