From: Jérôme Carretero > I was happily using big (10MB) buffers before, and with recent kernels, > when using USB3, I had to reduce the size of my buffers a lot. > By the way, I couldn't find any information on a maximum size for the > bulk transfers using libusb, maybe you know about that also ? > > So, using v3.13, this what I get from the kernel when doing a bulk read > of 4 MiB: > > [ 506.856282] xhci_hcd 0000:00:14.0: Too many fragments 256, max 63 ... > I saw your 3.12-td-fragment-failure branch and tried it; there, > sometimes the transfers don't work, with: > > xhci_hcd 0000:00:14.0: WARN Event TRB for slot 10 ep 4 with no TDs queued? > python2: page allocation failure: order:10, mode:0x1040d0 I've had a quick look and the reason for the allocation failure is fairly obvious. The libusb ioctl is handled by proc_do_submiturb() it will use scatter-gather for long requests, but always chops things up into 16k fragments. So (as in the trace above) a 4MB transfers requires 256 fragments. If the number of segments exceeds the advertised sg_tablesize (which is now 128) then it falls back on using a single fragment. For a 4MB buffer this is 1024 contiguous pages - not surprisingly it sometimes fails (it really ought to sleep - but that is another issue). Possibly proc_do_submit() should use longer fragments [1] in order to get below the sg_tablesise limit. However this is still doomed to fail. A single 16MB buffers crosses at least 255 64kB boundaries so the xhci driver will need 256 or 257 TRB to describe the buffer. The only way for xhci to accept these transfers is to apply the patch I posted last week that checks for aligned buffers and skips the 'pad with NOPs' code if they are aligned, and then set sg_tablesize to ~0. The 'struct usb_bus' currently contains 2 fields associated with scatter- gather: - no_sg_constraint:1 is set by xhci and checked by usbnet/ax88179_178a before it uses 'randomly aligned' fragments. - sg_tablesize is supposed to be the limit on the number of sg fragments. ehci, ohci and uhci either set 0 or ~0. xhci currently sets TRBS_PER_SEGMENT/2 == 128 (previously 32, older ~0). Some code only checks for non-zero. It would be better if the former were changed to be a limit on the number of 'unconstrained' fragments; since that limit is somewhat different (in xhci) from the limit on the number of aligned fragments. Alternatively both could be treated at booleans, and we just hope that any fragment limits aren't exceeded. [1] I don't know if it is best to try to allocate 2^n pages, falling back on smaller sizes (if they'll meet the fragment count limit) rather than allocating equal sized fragments. The code should probably also be willing to allocate more fragments if it can't allocated even 16k blocks. However processing variable-sized fragment lists requires that the sg[].length field not be modified by the dma_map code - I don't know if that is generally true? David ��.n��������+%������w��{.n�����{���)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥