On Wed, 06 Feb 2008 12:26:39 -0600 James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > On Wed, 2008-02-06 at 10:15 -0800, Andrew Morton wrote: > > On Wed, 6 Feb 2008 09:40:15 -0800 (PST) bugme-daemon@xxxxxxxxxxxxxxxxxxx wrote: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=9901 > > > > > > Summary: kernel panic in stex modules (?) > > > Product: IO/Storage > > > Version: 2.5 > > > KernelVersion: 2.6.24 > > > Platform: All > > > OS/Version: Linux > > > Tree: Mainline > > > Status: NEW > > > Severity: normal > > > Priority: P1 > > > Component: Serial ATA > > > AssignedTo: jgarzik@xxxxxxxxx > > > ReportedBy: dairinin@xxxxxxxxx > > > > > > > > > Latest working kernel version: 2.6.23-r6 > > > Earliest failing kernel version: 2.6.24 > > > Distribution: Gentoo > > > Hardware Environment: Core2D E6600, Asus p5B Dlx, 2G DDR2 667, Promise ST > > > EX4350 > > > Software Environment: GCC 4.2.3/4.1.2, CFLAGS="-O2" > > > > > > Problem Description: > > > The problem is frequent kernel panics within the same module. Can't say what it > > > is, but looks like it is related to dma and promise driver. > > > The first culprit, the memory, is ok, 8 hours of memtest passed without errors. > > > Before, kernel 2.6.23-gentoo-r6, compiled with GCC 4.1.2 worked just fine, then > > > after upgrade to 4.2.2 th bug appeared. Upgrade to 2.6.24 didn't solve the > > > problem. Switching back to GCC 4.1.2 made things better for a moment, crashes > > > became less frequent and I thought compiler was the cause. But today system > > > crashed again with same symptoms. > > > Sorry, but I can't save crash log, so I'll provide screen "shot": > > > http://img238.imageshack.us/my.php?image=p2030030ki1.jpg > > > > > > Steps to reproduce: > > > Boot, start FTP-server, load RAID with heavy input, in some hours it will > > > crash. With pure reads system can run several days, heavy write load kills it > > > much too easier. > > > > > > > The supertrak driver has regressed in 2.6.24. And > > > > commit 9cb83c7529d929c00f37d821daed1942a1b20602 > > Author: FUJITA Tomonori <tomof@xxxxxxx> > > Date: Tue Oct 16 11:24:32 2007 +0200 > > > > [SCSI] add use_sg_chaining option to scsi_host_template > > > > looks a likely candidate. > > > > And this: > > > > commit d3f46f39b7092594b498abc12f0c73b0b9913bde > > Author: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> > > Date: Tue Jan 15 11:11:46 2008 -0600 > > > > [SCSI] remove use_sg_chaining > > > > from 2.6.25 looks to be a likely fix for it. Should it be backported? > > If the patch you identify is the culprit, mine can't be the fix ... and > it should also be present in git head. > > The BUG_ON is here: isn't it? > > static inline void > dma_unmap_sg(struct device *hwdev, struct scatterlist *sg, int nents, > int direction) > { > BUG_ON(!valid_dma_direction(direction)); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > dma_ops->unmap_sg(hwdev, sg, nents, direction); > } > > stex only does scsi_dma_unmap(), so something looks to have tampered > with the cmnd->sc_data_direction somehow ... and I can't see how. Surely, someone changes the cmnd->sc_data_direction, or else we should be hit by dma_map_sg before dma_unmap_sg: static inline int dma_map_sg(struct device *hwdev, struct scatterlist *sg, int nents, int direction) { BUG_ON(!valid_dma_direction(direction)); return dma_ops->map_sg(hwdev, sg, nents, direction); } - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html