James Bottomley wrote:
With mem<=4098M or sata_nv.adma=0 it still mounts and works ok.
As I wrote, it would appear that somehow the blk_queue_bounce_limit
setting that the driver has made is not being respected and the block
layer is still trying to feed it addresses over 4GB. Any ideas anyone?
Actually, I'd be very sceptical that the blk_queue_bounce_limit isn't
working as advertised; there'd be a large number of failures if it were
not.
However, the "as advertised" part doesn't seem to be generally well
understood. The point being that block commands are only bounced if
they come from the filesystem or the user, not if they're generated
directly inside the kernel. Since the fault occurs before mount, it's
suggestive of the latter.
A long time ago, GFP_KERNEL allocations meant that the memory allocated
was physically under 4GB. Then x86_64 (and before it ia64) wanted to
break this. So they introduced a new flag: GFP_DMA32 that behaved like
the old GFP_KERNEL and then changed GFP_KERNEL on their architectures to
return memory from anywhere. I'd strongly suggest some piece of kernel
memory was allocated for a transfer buffer without GFP_DMA32 and then
passed in to the driver. Unfortunately, that could be anywhere inside
cdrom or sr. Knowing what the actual command is might help ... some of
the distinctive MMC media ones only have a single source.
Just to give some background on what sata_nv is trying to do:
There are two sets of static DMA memory allocations that sata_nv does,
the legacy ATA PRD and padding buffer which need to be in the lower 4GB,
and the ADMA APRD and CPB areas which can be anywhere in 64-bit memory.
With this patch, this is done by setting a 32-bit DMA mask before
allocating the legacy areas and setting a 64-bit DMA mask before
allocating the ADMA areas. Previously the driver just set a 64-bit mask
before the legacy PRD got allocated so it could end up above 4GB, in
which case legacy DMA couldn't possibly work. That part of the problem
appears to be successfully fixed by the patch in question.
There's a further problem with runtime DMA mapping, however. Normally
when ADMA is enabled the controller can reach anywhere in 64-bit memory.
However, if an ATAPI device is connected, since ADMA doesn't work with
ATAPI commands we have to switch it off on that port and use legacy DMA,
which is limited to 32-bit. This is where the blk_queue_bounce_limit
call comes in, it's trying to make the block layer bounce requests above
4GB when legacy DMA is in use.
I don't think the problem is that there's some buffer which is getting
allocated above 4GB and never bounced, since the problem goes away if
ADMA is disabled entirely and the DMA mask remains 32-bit always. My
guess is something is basing its decision on whether to bounce or not on
the device DMA mask. That can't possibly work properly for sata_nv since
the same PCI device has 2 ports, one of which can be in ADMA mode and
64-bit capable and the other can be in legacy mode and only 32-bit capable.
Tejun, I believe you had a patch that was printing warnings when libata
tried to program a legacy PRD with an address over 4GB. Could we change
that to WARN_ON and get someone experiencing this to try it and
see what the stack trace points to?
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html