On Sat, Sep 28, 2013 at 9:38 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > > Very few non-xHCI controllers can do DMA above the 4 GB limit. Yes, but I am wondering non-xHCI need this kind of zero copy optimization, since very few user space drivers complain or care performance or cpu utilization when devices attach to non xHCI. > >> > make sure this will happen? >>actually >> That can't be guaranteed but we can handle it with page bounce, just like >> block device. > > Obviously. But if we have to bounce the pages, it isn't zero-copy any > more. Suppose the optimization is mainly for xHCI, there should be no such problem. The problem only exists when non-xHCI is used and system has more than 4GB memory, which looks not a mainstream configuration. I propose the idea only for comparing the two approaches, and each one has its own advantage and disadvantage, maybe the two can coexist. mmap approach: - interface is a bit complicated, each URB need usbfs to allocate one buffer - not easy to scale well if the buffer need to be very big for obtaining good performance direct i/o approach: - interface is simple, maybe passing O_DIRECT to open() should be enough - if HCD can't DMA to 4GB above memory, part of 4GB above pages need to be bounced. > >> Actually I observed both throughput and cpu utilization can be improved >> with the 4GB of DMA limit on either 32bit arch or 64bit arch, wrt. direct I/O >> over usb mass storage block device. > > This may depend more on the host controller capabilities than on the > CPU architecture. Yes, but for most cases, more than 4GB ram is seldom used in 32bit CPU. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html