Hi, On Wed, Dec 26, 2007 at 05:31:51PM +0000, rubisher wrote: > Hello Grant, > > I suspecting a possible issue with this hack in your iommu_fill_pdir(): > > you initialized dma_sg with the adress of startsg (/* pointer to current > DMA */) > then before the loop you dma_sg--; Yes. The comment before that line explains why it does that. ... > Now in the while (nents-- > 0), suppose the test "if > (sg_dma_address(startsg) & PIDE_FLAG) {" failed, Do you have any evidence this test has failed when dma_sg is pointing at garbage? While possible, that would be a bug in iommu_coalesce_chunks() for not setting PIDE_FLAG. > so later in the loop the "sg_dma_len(dma_sg) += startsg->length" (which is > actually "dma_sg->iova_length += startsg->length" ) imo could corrupt > something? Yes, that would be the result. Can you try a bug catcher to prove that's something is actually getting corrupted? Add something like the following around line 65 (before "sg_dma_len(dma_sg)" is assigned): BUG_ON(dma_sg < startsg); On the same note, line 44 is clearly wrong: 41 if (sg_dma_address(startsg) & PIDE_FLAG) { 42 u32 pide = sg_dma_address(startsg) & ~PIDE_FLAG; 43 44 BUG_ON(pdirp && (dma_len != sg_dma_len(dma_sg))); 45 46 dma_sg++; The BUG_ON at line 44 might fail when it shouldn't (and vice versa). My preference is to remove it or put "#ifdef DEBUG_IOMMU" around that line of code (not literally, but effectively). In general, I didn't like the "pre-decrement" but it seems to work and makes the code a bit more efficient. Efficiency is extremely important for this code since it gets called so often. Small changes can have easily measured impact. > That said I tried to re-use the first implementation of jejb (what was in > ccio-dma.c before this patch > <http://cvs.parisc-linux.org/linux-2.6/drivers/parisc/ccio-dma.c?r1=1.12&r2=1.13> > but that doesn't seems to fix the ccio-dma issue at all: I can still read > those kind of message at the console while doing such copy > [snip] > scsi1: (4:0) phase mismatch at 01e8, phase IO CD MSG BSY REQ MSG IN > scsi1: Bus Reset detected, executing command 10953600, slot 109708a4, dsp > 001301e8[01e8] I'm thinking we really need SCSI bus traces to figure out if the SCSI driver is doing the right thing and if not, exactly what is it doing. If it is a CCIO bug, my guess is it's more likely to be problems with setting magic bits. We really need the ERS to review register settings. .. > (the scsi1 is the lasi scsi hba as sources and the target being the disks > on ncr53c720 hba) > > or experimenting fs issues on this target disks? I doubt this is a file system problem. > That said ok I will wait either U2/Uturn ers public doc or all volonteers > feedback. I'm skeptical for the former and hopeful for the latter. There is a chance Linux Foundation could ask HP for those docs under NDA. But you need to sign up with Linux Foundataion as a developer and then request HP for those docs. cheers, grant - To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html