On Sat, 2008-01-12 at 19:38 -0600, Robert Hancock wrote: > James Bottomley wrote: > > On Sat, 2008-01-12 at 17:04 -0600, Robert Hancock wrote: > >> I don't think the problem is that there's some buffer which is getting > >> allocated above 4GB and never bounced, since the problem goes away if > >> ADMA is disabled entirely and the DMA mask remains 32-bit always. My > >> guess is something is basing its decision on whether to bounce or not on > >> the device DMA mask. That can't possibly work properly for sata_nv since > >> the same PCI device has 2 ports, one of which can be in ADMA mode and > >> 64-bit capable and the other can be in legacy mode and only 32-bit capable. > > > > Erm, well, you can't decouple them. Having a differing blk queue bounce > > mask and device mask is going to cause huge trouble. The reason is this > > insidious nasty called swiotlb. Basically, with it enabled (and again, > > it can be on ia64 or x86_64), the kernel can bypass the bounce limit > > safe in the knowledge that swiotlb will fix up behind in the dma_map_ > > Unfortunately, if the device mask doesn't match the queue mask then > > swiotlb will never kick in and you'll end up with mapped pages beyond > > the 4GB limit. > > Yuck.. All the IOMMU DMA mapping code checks against the device DMA > mask, so it looks like if we get to the point of doing the DMA mapping > on >4GB addresses in libata we're screwed with this approach. > > The key problem is that both ata_ports share the same struct device with > one DMA mask which really doesn't match what this controller wants. I > wonder if we could do a different struct device for each port? > > Other than that, I guess the solutions would be to just set a 32-bit > mask on the device if either port has an ATAPI device connected (which > is fairly ugly, considering that you could do things like hotplug an > ATAPI device when the other port was in use, for example), or do > something to prevent requests from reaching this point with >4GB > addresses in the first place.. Well ... assuming this is the problem (and perhaps we'd better get the traces to confirm) there are at least three possible solutions: 1. As you say, just take the pci device mask down to 32 bits. 2. Find the problematic allocations and add GFP_DMA32 3. set the mask on the actual SCSI device rather than the PCI device and pass that into dma_map_ (this approach would have to get signoff from the arch people; I know it will work on parisc and x86 but I'm not sure about any other arch). > >> Tejun, I believe you had a patch that was printing warnings when libata > >> tried to program a legacy PRD with an address over 4GB. Could we change > >> that to WARN_ON and get someone experiencing this to try it and > >> see what the stack trace points to? > > > > Unfortunately, the stack trace probably won't help, since the command > > likely gets issued from the block request function, so the trace won't > > go back to the culpable initiator; that's why the command would be > > helpful. > > Well, dumping the ATA command surely isn't helpful, as I'm sure it will > be PACKET. I guess we'd have to dump out the actual CDB.. Sorry, when a SCSI person says dump the command, they mean the CDB. James - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html