Re: [PATCH rc8-mm1] hotfix libata-scsi corruption

Hugh Dickins <hugh@xxxxxxxxxxx> · Tue, 22 Jan 2008 18:36:18 +0000 (GMT)

On Tue, 22 Jan 2008, James Bottomley wrote:
> > --- 2.6.24-rc8-mm1/drivers/ata/libata-scsi.c	2008-01-17 16:49:47.000000000 +0000
> > +++ linux/drivers/ata/libata-scsi.c	2008-01-22 15:45:40.000000000 +0000
> > @@ -826,7 +826,7 @@ static void ata_scsi_sdev_config(struct 
> >  	sdev->max_device_blocked = 1;
> >  
> >  	/* set the min alignment */
> > -	blk_queue_update_dma_alignment(sdev->request_queue, ATA_DMA_PAD_SZ - 1);
> > +	blk_queue_update_dma_alignment(sdev->request_queue, ATA_SECT_SIZE - 1);
> >  }
> >  
> >  static void ata_scsi_dev_config(struct scsi_device *sdev,
> 
> Unfortunately, that's likely not the entire hot fix ... the implication
> is that we have some mapping error in the way we do direct SG_IO.

Quite possibly, I'm not sure.

> What the fix you propose does is make it far more likely that block will
> copy, perform I/O then uncopy (almost certain, since most smartd data
> transfers are well under ATA_SECT_SIZE, which is 512).  However,
> implicating a generic path like this implies that we would get the same
> problem for SCSI commands as well, so the correct hot fix is below.

I've not noticed any problems from the normal activity of the system,
only from smartd's sg_ioctl.  My impression was that it's a libata
issue, because it's going through ata_pio_sector, which does 

	ap->ops->data_xfer(qc->dev, buf + offset, qc->sect_size, do_write);

referring to sect_size, without considering the possibility of any smaller
I/O size.  (Me, I don't even know why it's going PIO rather than DMA:
I'm assuming smartd does things that way, but there's no limit to my
ignorance here.)

> However, I'd like to see if we can track the problem through the SG_IO
> direct path ... how many adjacent page bytes are corrupt?  Just a few or
> a large number (I'm wondering if it's an off by one or off by alignment
> type bug)?

I've assumed it's just the one next page: because ata_pio_sector is
doing a data_xfer of sect_size ATA_SECT_SIZE 512 to an offset above
0xe00 in the smartd stack page.  The time I actually saw corruption
rather than an oops at startup, it was in a tmpfs swap vector page
running 64-bit kernel, and I didn't examine any further pages (just
checked the page before and matched it up to smartd's stack, already
suspecting that).

I don't believe it's an off-by-one at your SCSI end.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html