Re: usb: dwc2: gadget: high-bandwidth (mc > 1) status?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/26/2021 12:53 PM, Pavel Hofman wrote:
> Dne 26. 11. 21 v 7:35 Minas Harutyunyan napsal(a):
>> Hi Pavel,
>>
>> On 11/25/2021 12:47 PM, Pavel Hofman wrote:
>>>
>>>
>>> Dne 24. 11. 21 v 15:04 Minas Harutyunyan napsal(a):
>>>> Hi Pavel,
>>>>
>>>> On 11/24/2021 11:39 AM, Pavel Hofman wrote:
>>>>> Hi Minas at all,
>>>>>
>>>>> Please does dwc2 (specifically in BCM2835/RPi) support HS ISOC 
>>>>> multiple
>>>>> transactions mc > 1 reliably? I found this condition
>>>>> https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/usb/dwc2/gadget.c*L4041__;Iw!!A4F2R9G_pg!MMNE6CYvWEFeWt8W9pImwNA-N4_04U8UsBWQmu9O9Bwq1HalCAupyb9kzGBAOOMlKmt6xefz$ 
>>>>>
>>>>>
>>>>>
>>>>>        /* High bandwidth ISOC OUT in DDMA not supported */
>>>>>        if (using_desc_dma(hsotg) && ep_type == 
>>>>> USB_ENDPOINT_XFER_ISOC &&
>>>>>            !dir_in && mc > 1) {
>>>>>            dev_err(hsotg->dev,
>>>>>                "%s: ISOC OUT, DDMA: HB not supported!\n", __func__);
>>>>>            return -EINVAL;
>>>>>        }
>>>>>
>>>>> But I do not know how the Descriptor DMA is critical and whether
>>>>> disabling it will affect gadget performance seriously.
>>>>>
>>>>> I know about the RX FIFO sizing requirement (and TX FIFO too I guess),
>>>>> the current default values can be increased for that particular use 
>>>>> case
>>>>> if needed.
>>>>>
>>>>> I am trying to learn if it made sense to spend time on adding support
>>>>> for high-bandwidth to the UAC2 audio gadget  to allow using larger
>>>>> bInterval and mc=2,3 at high samplerates/channel counts (sort of 
>>>>> "burst
>>>>> mode" similar to UAC3). When doing some CPU-demanding DSP it would 
>>>>> help
>>>>> to avoid the time-critical handling every 125us microframe. Both 
>>>>> OUT and
>>>>> IN are important.
>>>>>
>>>>
>>>> According programming guide:
>>>>
>>>> "Isochronous OUT Transfers
>>>> The application programming for isochronous out transfers is in the 
>>>> same
>>>> manner as Bulk OUT transfer sequence, except that the application
>>>> creates only 1 packet per descriptor for an isochronous OUT endpoint.
>>>> The controller handles isochronous OUT transfers internally in the same
>>>> way it handles Bulk OUT transfers, and as depicted in Figure 10-28.
>>>> If the transfers are for a high-bandwidth endpoint (more than one MPS
>>>> per μframe ), create as many descriptors as the number of packets in a
>>>> μframe (number of descriptors = number of packets per μframe).
>>>> Maximum number of descriptors per μframe per endpoint is three."
>>>>
>>>> To program descriptors to start HB ISOC OUT there are no any problem.
>>>> Problem occurs on completions. If, for example mc > 1, driver will
>>>> allocate and program mc * (request count) descriptors. If host send mc
>>>> packets per frame then every mc descriptor perform request 
>>>> completion is
>>>> not big problem. But if host will send less than mc packets in frame
>>>> then not clear how to exclude unused descriptors from desc chain which
>>>> already fetched by core - by stop transfers (disable EP) and re-start
>>>> transfers (fill again desc chain) from next frame? Or purge unused 
>>>> descs
>>>> and shifting descriptors "up" in a chain? You can try to implement.
>>>
>>> Hi Minas, thanks for your hints. Unfortunately I am pretty new to dwc2,
>>> please can you point me to particular parts of the dwc2 code?
>>>
>>> I found some dwc2 description which reads your quote in
>>> https://urldefense.com/v3/__https://www.mouser.cn/datasheet/2/196/Infineon-xmc4500_rm_v1.6_2016-UM-v01_06-EN-598157.pdf__;!!A4F2R9G_pg!Jg2wfkRUfyO2jrnLXmO7zO5W0Esw-TTgETCTe5mqtpub1mAmDY7QnixT8HmYyTp0rb_ac7Ot$ 
>>>
>>> (not for BCM2835 but hopefully the principle is similar). IIUC by
>>> descriptor the struct dwc2_dma_decs is meant.
>>>
>> Yes, descriptors declared in dwc2 as dwc2_dma_desc.
>>
>>> I found a function gadget.c:dwc2_gadget_fill_isoc_desc which is called
>>> in dwc2_gadget_start_isoc_ddma and dwc2_hsotg_ep_queue. Is the code
>>> after the /* High bandwidth ISOC OUT in DDMA not supported */ comment in
>>> gadget.c:dwc2_hsotg_ep_enable() because the dwc2 core (the hardware)
>>> does not support HB in DDMA, or because the linux dwc2 driver does not
>>> implement the HB support in DDMA yet (which is what we are talking 
>>> about)?
>> HW supports HB ISOC OUT in DDMA, driver doesn't support. In mentioned by
>> you databook, see chapter "16.11.3.2 Isochronous OUT".
>>>
>>> I am asking because if the HW did not support DDMA, the method
>>> dwc2_gadget_start_isoc_ddma would be out of game for my analysis, right?
>>> If the latter is the case, should the HB support implementation change
>>> dwc2_gadget_start_isoc_ddma?
>>>
>> To support HB ISOC OUT should be updated dwc2_gadget_fill_isoc_desc()
>> and dwc2_gadget_complete_isoc_request_ddma() functions.
>>
>>> Please can you explain a bit more the issue about the unused
>>> descriptors? This is how I understand it (poorly). The driver prepares
>>> descriptors for all mc required by the transfer (and reported by
>>> wMaxPacketSize to the host) so that the core (HW) can fill it via DMA.
>>> However, if the host does not need the whole packet size, it will send
>>> fewer packets per frame, and some of the dwc2_dma_decs descriptors would
>>> not be filled with data = unused. The core (HW) somehow marks the
>>> descriptors whether they were used or not, and the unused descriptors
>>> (i.e. containing old/bogus data) should not undergo completion somehow.
>> Core doesn't mark unused descriptors.
>> Driver can detect that it is last packet in frame by checking DPID. If
>> DPID is DATA0 then it's last packet in frame and need to complete
>> appropriate usb request.
>> After completion of descriptor, core will process next descriptor which
>> is prepared for just completed usb request but not for next request (at
>> least from "buffer addresses" point of view).
>> In case if packet count sent by host in frame less than mc, driver
>> should exclude remaining descs for completed usb request from descriptor
>> list by "shifting up" descs in descriptor list. But I'm not sure that
>> driver have enough time to do that before core fetch next descriptor,
>> which should be already updated (at least "buffer address" should be
>> point to address for next usb request).
>>
>>> But this sounds too simple, not what you described in your post :-)
>>>
>>> Also, please when are completion interrupt requests thrown at ISOC OUT?
>>> After every packet=desc, or after the whole USB frame (i.e. after all 3
>>> packets in case of mc=3)? If after every packet, the HB mode with larger
>>> bInterval (less frequent frames with multiple packets) would not spare
>>> any interrupts/CPU load compared to more frequent frames with single
>>> packets (no HB mode) and adding the HB ISOC support would "only" allow
>>> higher ISOC bandwidth, not CPU load reduction. What is the case, please?
>> Completion interrupt asserted on the end of descriptor processing, if
>> IOC (Interrupt on completion) bit is set. For HB ISOC OUT this bit
>> should be set on all descriptors.
>>>
> 
> Minas, thanks for your expert answer. Just a quick question regarding 
> your previous paragraph - does it mean that ISOC OUT with mc=2 at 
> bInterval=2 yields 8k completion IRQs, just like with mc=1 at 
> bInterval=1? If so, no real CPU workload would be spared by implementing 
> the HB support.
Yes, if IOC bit set for all descriptors. For HB ISOC OUT per me should 
be set in all descriptors.

> 
> Is there any chance to complete all descriptors filled in one frame with 
> one IRQ, by setting the IOC bit only to the last descriptor? 
Because of host can send less packets than mc then in this case we can 
miss frame/usb request completion. I mean, data from next frame will be 
DMA-ed to buffer dedicated for previous frame (unused descriptor).

> IIUC that 
> would cause issues when the host does not send data for all descriptors 
> "prepared" by the gadget (as discussed above) but IMO that could be 
> handled somehow (host would likely not change the number of transactions 
> within one continuous stream, gadget could "estimate" how many 
> transactions would be used by the host for the particular altsetting). 
> Just trying to find if any way to reduce the IRQs is possible :-)
> 
I don't know the safe way to reduce the IRQs :-(.

Thanks,
Minas

> Thanks a lot! Best regards,
> 
> Pavel.





[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux