Andrew Morton wrote: > On Tue, 20 Mar 2007 00:25:02 +0100 > Andreas Steinmetz <ast@xxxxxxxx> wrote: > >> Mike Christie wrote: >>> Mike Christie wrote: >>>> James Bottomley wrote: >>>>> On Mon, 2007-03-19 at 12:49 -0500, Mike Christie wrote: >>>>>>> I can't even say if the tapes are written correctly as I can't read them >>>>>>> (one does not reboot production machines back to 2.4.x just to try to >>>>>>> read a backup tape - I don't have 2.6.x older than 2.6.20 on these >>>>>>> machines). >>>>>> Could you try this patch >>>>>> http://marc.info/?l=linux-scsi&m=116464965414878&w=2 >>>>>> I thought st was modified to not send offsets in the last elements but >>>>>> it looks like it wasn't. >>>>> Actually, there are two patches in the email referred to. If the >>>>> analysis that we're passing NULL to mempool_free is correct, it should >>>>> be the second one that fixes the problem (the one that checks >>>>> bio->bi_io_vec before freeing it). Which would mean we have a >>>>> nr_vecs==0 bio generated by the tar somehow. >>>>> >>>> I think we might only need the first patch if the problem is similar to >>>> what the lsi guys were seeing. I thought the problem is that we are not >>>> estimating how large the transfer is correctly because we do not take >>>> into account offsets at the end. This results in nr_vecs being zero when >>>> it should be a valid value. I thought Kai's patch: >>>> http://bugzilla.kernel.org/show_bug.cgi?id=7919 >>>> http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commitdiff;h=9abe16c670bd3d4ab5519257514f9f291383d104 >>>> fixed the problem on st's side, >>> Oh, I noticed that the subject for the mail references 2.6.30.3 and the >>> patch for st in the bugzilla did not make into 2.6.20 and is not in .3. >>> Could we try the st patch in the bugzilla first? >> Ok, the st patch from bugzilla solves the problem (tested on both >> affected machines). > > > If you're referring to the below patch then it's already in mainline, and > has been for a month. > Yes, that's the patch I'm referring to. > Have you tested 2.6.21-rc4? If not, please do so. > Sorry, this is not possible on these machines. They are production servers and every problem on them that cannot be easily solved via remote access is a 40km (one way) drive in the middle of the night. > Perhaps we should merge this into 2.6.20.x? > I would suggest so. > > > commit 9abe16c670bd3d4ab5519257514f9f291383d104 > Author: Kai Makisara <Kai.Makisara@xxxxxxxxxxx> > Date: Sat Feb 3 13:21:29 2007 +0200 > > [SCSI] st: fix Tape dies if wrong block size used, bug 7919 > > On Thu, 1 Feb 2007, Andrew Morton wrote: > > On Thu, 1 Feb 2007 15:34:29 -0800 > > bugme-daemon@xxxxxxxxxxxxxxxxxxx wrote: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=7919 > > > > > > Summary: Tape dies if wrong block size used > > > Kernel Version: 2.6.20-rc5 > > > Status: NEW > > > Severity: normal > > > Owner: scsi_drivers-other@xxxxxxxxxxxxxxxxxxxx > > > Submitter: dmartin@xxxxxxxxxxxx > > > > > > > > > Most recent kernel where this bug did *NOT* occur: 2.6.17.14 > > > > > > Other Kernels Tested and Results: > > > > > > OK 2.6.15.7 > > > OK 2.6.16.37 > > > OK 2.6.17.14 > > > BAD 2.6.18.6 > > > BAD 2.6.18-1.2869.fc6 > > > BAD 2.6.19.2 + > > > BAD 2.6.20-rc5 > > > > > > NOTE: 2.6.18-1.2869.fc6 is a Fedora modified kernel, all others are from kernel.org > > > > ... > > > Steps to reproduce: > > > Get a Adaptec AHA-2940U/UW/D / AIC-7881U card and a tape drive, > > > install a recent kernel > > > set the tape block size - mt setblk 4096 > > > read from or write to tape using wrong block size - tar -b 7 -cvf /dev/tape foo > > > > Write does not trigger this bug because the driver refuses in fixed block > mode writes that are not a multiple of the block size. Read does trigger > it in my system. > > The bug is not associated with any specific HBA. st tries to do direct i/o > in fixed block mode with reads that are not a multiple of tape block size. > > The patch in this message fixes the st problem by switching to using the > driver buffer up to the next close of the device file in fixed block mode > if the user asks for a read like this. > > I don't know why the bug has surfaced only after 2.6.17 although the st > problem is old. There may be another bug in the block subsystem and this > patch works around it. However, the patch fixes a problem in st and in > this way it is a valid fix. > > This patch may also fix the bug 7900. > > The patch compiles and is lightly tested. > > Signed-off-by: Kai Makisara <kai.makisara@xxxxxxxxxxx> > Signed-off-by: James Bottomley <James.Bottomley@xxxxxxxxxxxx> > > diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c > index e016e09..fba8b20 100644 > --- a/drivers/scsi/st.c > +++ b/drivers/scsi/st.c > @@ -9,7 +9,7 @@ > Steve Hirsch, Andreas Koppenh"ofer, Michael Leodolter, Eyal Lebedinsky, > Michael Schaefer, J"org Weule, and Eric Youngdale. > > - Copyright 1992 - 2006 Kai Makisara > + Copyright 1992 - 2007 Kai Makisara > email Kai.Makisara@xxxxxxxxxxx > > Some small formal changes - aeb, 950809 > @@ -17,7 +17,7 @@ > Last modified: 18-JAN-1998 Richard Gooch <rgooch@xxxxxxxxxxxxx> Devfs support > */ > > -static const char *verstr = "20061107"; > +static const char *verstr = "20070203"; > > #include <linux/module.h> > > @@ -1168,6 +1168,7 @@ static int st_open(struct inode *inode, > STps = &(STp->ps[i]); > STps->rw = ST_IDLE; > } > + STp->try_dio_now = STp->try_dio; > STp->recover_count = 0; > DEB( STp->nbr_waits = STp->nbr_finished = 0; > STp->nbr_requests = STp->nbr_dio = STp->nbr_pages = STp->nbr_combinable = 0; ) > @@ -1400,9 +1401,9 @@ static int setup_buffering(struct scsi_t > struct st_buffer *STbp = STp->buffer; > > if (is_read) > - i = STp->try_dio && try_rdio; > + i = STp->try_dio_now && try_rdio; > else > - i = STp->try_dio && try_wdio; > + i = STp->try_dio_now && try_wdio; > > if (i && ((unsigned long)buf & queue_dma_alignment( > STp->device->request_queue)) == 0) { > @@ -1599,7 +1600,7 @@ st_write(struct file *filp, const char _ > STm->do_async_writes && STps->eof < ST_EOM_OK; > > if (STp->block_size != 0 && STm->do_buffer_writes && > - !(STp->try_dio && try_wdio) && STps->eof < ST_EOM_OK && > + !(STp->try_dio_now && try_wdio) && STps->eof < ST_EOM_OK && > STbp->buffer_bytes < STbp->buffer_size) { > STp->dirty = 1; > /* Don't write a buffer that is not full enough. */ > @@ -1769,7 +1770,7 @@ static long read_tape(struct scsi_tape * > if (STp->block_size == 0) > blks = bytes = count; > else { > - if (!(STp->try_dio && try_rdio) && STm->do_read_ahead) { > + if (!(STp->try_dio_now && try_rdio) && STm->do_read_ahead) { > blks = (STp->buffer)->buffer_blocks; > bytes = blks * STp->block_size; > } else { > @@ -1948,10 +1949,12 @@ st_read(struct file *filp, char __user * > goto out; > > STm = &(STp->modes[STp->current_mode]); > - if (!(STm->do_read_ahead) && STp->block_size != 0 && > - (count % STp->block_size) != 0) { > - retval = (-EINVAL); /* Read must be integral number of blocks */ > - goto out; > + if (STp->block_size != 0 && (count % STp->block_size) != 0) { > + if (!STm->do_read_ahead) { > + retval = (-EINVAL); /* Read must be integral number of blocks */ > + goto out; > + } > + STp->try_dio_now = 0; /* Direct i/o can't handle split blocks */ > } > > STps = &(STp->ps[STp->partition]); > diff --git a/drivers/scsi/st.h b/drivers/scsi/st.h > index 05a5cae..50f3deb 100644 > --- a/drivers/scsi/st.h > +++ b/drivers/scsi/st.h > @@ -117,7 +117,8 @@ struct scsi_tape { > unsigned char cln_sense_value; > unsigned char cln_sense_mask; > unsigned char use_pf; /* Set Page Format bit in all mode selects? */ > - unsigned char try_dio; /* try direct i/o? */ > + unsigned char try_dio; /* try direct i/o in general? */ > + unsigned char try_dio_now; /* try direct i/o before next close? */ > unsigned char c_algo; /* compression algorithm */ > unsigned char pos_unknown; /* after reset position unknown */ > int tape_type; > -- Andreas Steinmetz SPAMmers use robotrap@xxxxxxxx - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html