Hi, Jerry Zhang <zhangjerry@xxxxxxxxxx> writes: > Hi all, > > Thanks for the replies. > >> That wait_for_completion_interruptible() is what's killing >> performance. Each and every read/write waits for the USB side to >> complete. It would've been much better to have something like: > >> This would help the write() side of things by a long margin. For reads, >> what we could do is have a kernel ring buffer with pre-allocated and >> pre-queued usb_requests pointing to different pages in this ring >> buffer. When a read() comes, instead of queueing the request right >> there, we check if there's data in the internal ring buffer, if there >> is, we just copy_to_user(), otherwise we either return 0 or return >> -EAGAIN (depending on O_NONBLOCK). > > So you are saying that the reason a large request is faster is because the > wait_for_completion is amortized over the length of time the larger > request takes, not because usb_ep_queue is somehow more efficient > for larger data sizes? And as a result using ep_queue to enqueue several correct. usb_ep_queue() is basically a list_add_tail() followed by a few register writes (in DWC3, that is). > smaller requests and waiting for all of them to complete (using eventfd) > would give a similar result? yeah, plus it helps out with memory fragmentation issues. I'd rather have several PAGE_SIZE buffers, than a single large buffer. You can also disable interrupt on all but the e.g. 10th request. Say you create a queue of 200 requests, you can ask for interrupt at every 10th completion. >> If waiting is the problem then isn’t it solved by async IO? With it yeah, that helps too. >> user space can implement double (triple, whatever…) buffering and as >> soon as one request is completed, the next one becomes active. right, if that helps your usecase, we don't need to touch f_fs right now. I would still like to have proper O_NONBLOCK support without aIO dependency. > We do have double buffering...but via userspace aio rather than kernel. > If I understand correctly this wouldn't really be possible with userspace aio > since there is no guarantee that anything submitted that way will happen > in order. But the idea is that kernel aio can queue up several consecutive reads > and since they call ep_queue internally, they will be consistently ordered. right > Also, the overhead from kmalloc and copy_to/from_user will also be moot, > since the packet won't be sent till the previous finishes anyway, it doesn't > matter if there's a short delay before it's queued. O_DIRECT might thus have > a tiny benefit here, but as Felipe said, it is not that much. O_DIRECT actually forces us to block on a write()/read() AFAICT, because we need to pin user pages (to avoid copy_to/from_user()) and, thus, need to prevent userspace from modifying such pages until transfer is complete. > I didn't use kaio initially since it isn't in our libc (and some old > devices lack > support) but I think I can copy over syscall definitions and give this a shot. That would answer several questions, indeed. I wonder, though, when it comes to your product's policy and what not. Why does your libc lack kaio? Was it a conscious decision or just happen to be this way? -- balbi
Attachment:
signature.asc
Description: PGP signature