I'm writing a kernel driver for xhci. USB3 is used mainly for
high-performance external disk drives.
The driver receives a memory buffer to do DMA from (or to.) Buffer can
come from either user-land or kernel-land. So driver can't be sure the
buffer is mapped into kernel virtual address-space.
However, in 64-bit mode (x86-64), kernel holds entire physical memory
mapped into section of virtual address space.
So it's (relatively) low cost to do a clflush-pass over the buffer to
make sure none of the buffer is cached.
There are two options for doing the I/O
1 - Conventional
Use PCIe cache-coherency (snooping) without touching the buffer.
2 - Using no-snoop
Flush the buffer using clflush as described above. Make sure PCIe
no-snoop is enabled in the PCIe capability, and set the No-Snoop bit
in all transfer TRBs for the buffer.
I've benchmarked both methods and there seems to be no performance
difference whatsoever. The disks used are far from the fastest on the
marked, but reasonable (40MB - 70MB per second peak speed.)
So what puzzles me is -- why ever use the xhci no-snoop capability?????
It doesn't seem there's ever a scenario that makes it useful!
There are two situations
- a memory buffer is used by both CPU and XHCI. In this case, may as
well use the Conventional method... I mean... the cache units are going
to have to scan and flush the entire buffer either way right? So how
can we expect a performance gain or even lower power consumption?
- a memory buffer that's only ever used by XHCI. This is the case with
scratchpad buffers. The specification allows XHCI to snoop from these
buffers without any driver intervention.
So you see might point - the no-snoop option in the TRBs seems totally
useless.
Thanks for any insight.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html