On Thu, Jul 26, 2012 at 7:47 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > On Thu, 26 Jul 2012, Alexey Filin wrote: > >> Hello, >> >> I work in a scientific organization (www.ihep.su, experiments in high >> energy physics). My colleagues developed a crate controller with >> buffer memory and a usb interface (read out module, rom) to read out >> crate electronics with cypress ez-usb sx2 (cy7c68001). I developed a >> (Linux) kernel driver to read buffer memory in rom synchronously and >> asynchronously. It works fine. >> >> There is a test which makes me think that URB is scheduled always by >> USB controller in the next microframe and never in the current one >> even if it is "idle". > > You are wrong. Bulk URBs are always scheduled in the current > microframe (unless there isn't enough time remaining for a bulk > transaction). > >> cy7c68001 is configured for one pipe bulk-in >> with 512 bytes per packet/ Driver FIFO contains 64 items. Data >> transfer is implemented in driver with one iterative URB >> submission-receiving with one item as URB buffer. > > There's your problem. URB completions are reported at microframe > boundaries. By iteratively submitting an URB and waiting for it to > complete before submitting the next one, you have artificially slowed > down the transfer to one URB per microframe. > > To get maximum throughput you should submit multiple URBs originally, > and then submit a new URB whenever an URB completes. I recommand using > a pipeline depth of 10-20 ms. For bulk transfers at high speed, that > comes out to between 500 KB and 1 MB of concurrent URB data. transfer with 2kB per URB hits artificial limit in our read out module (19 MB/s for 128 kB URB). I think difference in performance for pipeline of concurrent URBs and one iterative URB with 128 kB buffer will be negligible, 128 kB URB takes at least 20 microframes (table 5-10: max 13 transfers of 512 payload packets per microframe): 128k / (13 * 512 bytes/packet) ~ 20 microframes so underuse of one microframe of 20 is a very small cost of simple driver. Moreover we read each set of 4 crates with one USB controller and (I think) hit its aggregate performance 47 MB/s (table 5-10 declares theoretical maximum 51 MB/s for bulk transfers). Nevertheless thank for advice, if we will implement USB3 interface I will think about pipeline. > >> A user process >> empties driver FIFO concurrently with URB handling. Each measurement >> is an average of 20 transfers by 32 MB each: >> >> columns: >> 1: size of FIFO item in bytes >> 2: measured value in MB/s >> 3: estimation in MB/s (= 8 kHz * size) >> >> results: >> 1*512 3.86 3.91 >> 2*512 7.62 7.81 >> 3*512 11.3 11.7 >> 4*512 14.9 15.6 > > As expected. > >> I see the same behaviour on different test systems with different USB >> controllers. That means USB controller developers and producers >> deceive users, because they don't permit to use full USB bandwidth for >> short transfers without serious reasons. > > No, it's not a deception. It's simply a mistake in your driver. It is not a mistake, it is an intention to guarantee data consistency. I'm not sure all USB controllers (and USB subsystem) submit URB/call completion handler for some URBs for one pipe always sequentially. I worry about data consistency, data are very expensive for us. Can you guarantee that? > >> Now we think about an external bus adapter with usb interface. Each >> cycle on extern bus should be handled properly to provide consistency >> of command execution. If a bus cycle is implemented with one URB >> transfer then its duration is limited by duration of one USB >> microframe (125 us). 125 us is very long for us. What we can do: >> >> * develop a macrocommand protocol, to run some bus cycles by one URB >> transfer with built-in macrocommand. That requires complex external >> bus controller logics and limits use cases by a macrocommand pattern. >> * get a USB controller that can schedule transfers in current >> microframe. In accordance with USB2.0 specification we could get cycle >> duration 1/(131 * 8 kHz) ~ 1us (table 5-10 permits 131 transfer for >> 2-byte payload). That is enough for us. >> >> What we can't: >> >> * submit some URBs concurrently >> * use some pipes concurrently >> >> because external bus cycles can depend on results of previous cycles. >> >> The questions are: >> >> * Is there a USB controller with scheduling in current microframe? > > They all do it. But they won't generate completion IRQs any faster > than 8 KHz. > >> * Is it possible to schedule transfer in current microframe from a >> Linux driver? >> * If not, is there a FPGA (open) IP core for USB controller to fix >> and make it to schedule URB transfers in current microframe? >> * if not, is there a way to make USB controller developers to >> implement scheduling in current microframe? >> * if not, is there another way to get 1 us bus cycle by a USB link? > > You can do this only by hacking up a special driver of your own. > Since transfer completions would not be reported by IRQs in time, you > would have to poll for transfer completions at microsecond intervals. > This would present a rather large overhead for the computer, but it > might work. > > On the whole, USB might not be a good way to achieve what you want. thanks for advice. It seems to be enough complex. It requires hacked USB controller driver (and USB subsystem?) and dedicated USB controller for specialized transfers only to simplify URB handling, is it? May be even sell my soul to the devil... > >> Thanks for answers in advance, >> Alexey. >> >> PS It seems USB3.0 has the same drawback > > It also generates completion IRQs at microframe boundaries. > > Alan Stern > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html