On 1/26/2021 12:43 PM, Thinh Nguyen wrote: > Wesley Cheng wrote: >> >> On 1/22/2021 4:15 PM, Thinh Nguyen wrote: >>> Hi, >>> >>> Wesley Cheng wrote: >>>> Some devices have USB compositions which may require multiple endpoints >>>> that support EP bursting. HW defined TX FIFO sizes may not always be >>>> sufficient for these compositions. By utilizing flexible TX FIFO >>>> allocation, this allows for endpoints to request the required FIFO depth to >>>> achieve higher bandwidth. With some higher bMaxBurst configurations, using >>>> a larger TX FIFO size results in better TX throughput. >>>> >>>> By introducing the check_config() callback, the resizing logic can fetch >>>> the maximum number of endpoints used in the USB composition (can contain >>>> multiple configurations), which helps ensure that the resizing logic can >>>> fulfill the configuration(s), or return an error to the gadget layer >>>> otherwise during bind time. >>>> >>>> Signed-off-by: Wesley Cheng <wcheng@xxxxxxxxxxxxxx> >>>> --- >>>> drivers/usb/dwc3/core.c | 2 + >>>> drivers/usb/dwc3/core.h | 8 ++ >>>> drivers/usb/dwc3/ep0.c | 2 + >>>> drivers/usb/dwc3/gadget.c | 194 ++++++++++++++++++++++++++++++++++++++++++++++ >>>> 4 files changed, 206 insertions(+) >>>> >>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c >>>> index 6969196..e7fa6af 100644 >>>> --- a/drivers/usb/dwc3/core.c >>>> +++ b/drivers/usb/dwc3/core.c >>>> @@ -1284,6 +1284,8 @@ static void dwc3_get_properties(struct dwc3 *dwc) >>>> &tx_thr_num_pkt_prd); >>>> device_property_read_u8(dev, "snps,tx-max-burst-prd", >>>> &tx_max_burst_prd); >>>> + dwc->needs_fifo_resize = device_property_read_bool(dev, >>>> + "tx-fifo-resize"); >>>> >>>> dwc->disable_scramble_quirk = device_property_read_bool(dev, >>>> "snps,disable_scramble_quirk"); >>>> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h >>>> index eec1cf4..983b2fd4 100644 >>>> --- a/drivers/usb/dwc3/core.h >>>> +++ b/drivers/usb/dwc3/core.h >>>> @@ -1223,6 +1223,7 @@ struct dwc3 { >>>> unsigned is_utmi_l1_suspend:1; >>>> unsigned is_fpga:1; >>>> unsigned pending_events:1; >>>> + unsigned needs_fifo_resize:1; >>> The prefix "need" sounds like a requirement, but I don't think it is the >>> case here. I think "do" would be a better prefix here. >>> >> Hi Thinh, >> >> Sure, that is true, since this may be an optional flag for certain >> platforms. >> >>>> unsigned pullups_connected:1; >>>> unsigned setup_packet_pending:1; >>>> unsigned three_stage_setup:1; >>>> @@ -1257,6 +1258,10 @@ struct dwc3 { >>>> unsigned dis_split_quirk:1; >>>> >>>> u16 imod_interval; >>>> + >>>> + int max_cfg_eps; >>>> + int last_fifo_depth; >>>> + int num_ep_resized; >>>> }; >>> Please document these new fields. >>> >> Will do. >> >>>> >>>> #define INCRX_BURST_MODE 0 >>>> @@ -1471,6 +1476,7 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd, >>>> struct dwc3_gadget_ep_cmd_params *params); >>>> int dwc3_send_gadget_generic_command(struct dwc3 *dwc, unsigned int cmd, >>>> u32 param); >>>> +void dwc3_gadget_clear_tx_fifos(struct dwc3 *dwc); >>>> #else >>>> static inline int dwc3_gadget_init(struct dwc3 *dwc) >>>> { return 0; } >>>> @@ -1490,6 +1496,8 @@ static inline int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd, >>>> static inline int dwc3_send_gadget_generic_command(struct dwc3 *dwc, >>>> int cmd, u32 param) >>>> { return 0; } >>>> +static inline void dwc3_gadget_clear_tx_fifos(struct dwc3 *dwc) >>>> +{ } >>>> #endif >>>> >>>> #if IS_ENABLED(CONFIG_USB_DWC3_DUAL_ROLE) >>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c >>>> index 8b668ef..4f216bd 100644 >>>> --- a/drivers/usb/dwc3/ep0.c >>>> +++ b/drivers/usb/dwc3/ep0.c >>>> @@ -616,6 +616,8 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl) >>>> return -EINVAL; >>>> >>>> case USB_STATE_ADDRESS: >>>> + dwc3_gadget_clear_tx_fifos(dwc); >>>> + >>>> ret = dwc3_ep0_delegate_req(dwc, ctrl); >>>> /* if the cfg matches and the cfg is non zero */ >>>> if (cfg && (!ret || (ret == USB_GADGET_DELAYED_STATUS))) { >>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>>> index 86f257f..26f9d64 100644 >>>> --- a/drivers/usb/dwc3/gadget.c >>>> +++ b/drivers/usb/dwc3/gadget.c >>>> @@ -615,6 +615,161 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action) >>>> static void dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, >>>> bool interrupt); >>>> >>>> +static int dwc3_gadget_calc_tx_fifo_size(struct dwc3 *dwc, int mult) >>> Can you document what this function does? >>> >> Will do. >> >>>> +{ >>>> + int max_packet = 1024; >>> Maybe you can also document why you chose 1024 (e.g. applicable to >>> Enhanced SuperSpeed only?). >>> >> Sure. Its basically applicable for SS and isoc (hs/ss) use cases since >> max packet size is 1024 in both cases. > > Highspeed bulk MPS is 512. Fullspeed varies more (bulk max MPS is 64 and > isoc is 1023). > > However, we can keep it simple with 1024 as if it's ok to over estimate. > Just need to note that here. > >> >>>> + int fifo_size; >>>> + int mdwidth; >>>> + >>>> + mdwidth = DWC3_MDWIDTH(dwc->hwparams.hwparams0); >>>> + /* MDWIDTH is represented in bits, we need it in bytes */ >>>> + mdwidth >>= 3; >>> mdwidth for DWC32 requires to read hwparams6 for the upper 2 significant >>> bits. Can we add a check for DWC32 also? You can check how we're doing >>> it now in the current code. >>> >> Sure. I'll make sure to get the correct registers for the DWC32 case. >> >>>> + >>>> + fifo_size = mult * ((max_packet + mdwidth) / mdwidth) + 1; >>>> + return fifo_size; >>>> +} >>>> + >>>> +void dwc3_gadget_clear_tx_fifos(struct dwc3 *dwc) >>>> +{ >>>> + struct dwc3_ep *dep; >>>> + int fifo_depth; >>>> + int size; >>>> + int num; >>>> + >>>> + if (!dwc->needs_fifo_resize) >>>> + return; >>>> + >>>> + /* Read ep0IN related TXFIFO size */ >>>> + dep = dwc->eps[1]; >>>> + size = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0)); >>>> + if (DWC3_IP_IS(DWC31)) >>>> + fifo_depth = DWC31_GTXFIFOSIZ_TXFDEP(size); >>>> + else >>>> + fifo_depth = DWC3_GTXFIFOSIZ_TXFDEP(size); >>> The driver handles 3 IPs. Getting the fifo depth for DWC32 is the same >>> as DWC31. So the condition should be >>> if (DWC3_IP_IS(DWC3)) >>> fifo_depth = ... >>> else >>> fifo_depth = ... >>> >> Understood. >> >>>> + >>>> + dwc->last_fifo_depth = fifo_depth; >>>> + /* Clear existing TXFIFO for all IN eps except ep0 */ >>>> + for (num = 3; num < min_t(int, dwc->num_eps, DWC3_ENDPOINTS_NUM); >>>> + num += 2) { >>>> + dep = dwc->eps[num]; >>>> + /* Don't change TXFRAMNUM on usb31 version */ >>>> + size = DWC3_IP_IS(DWC31) ? >>>> + dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(num >> 1)) & >>>> + DWC31_GTXFIFOSIZ_TXFRAMNUM : 0; >>>> + >>> Same here. Check for DWC32. >>> >>>> + dwc3_writel(dwc->regs, DWC3_GTXFIFOSIZ(num >> 1), size); >>>> + } >>>> + dwc->num_ep_resized = 0; >>>> +} >>>> + >>>> +/* >>>> + * dwc3_gadget_resize_tx_fifos - reallocate fifo spaces for current use-case >>>> + * @dwc: pointer to our context structure >>>> + * >>>> + * This function will a best effort FIFO allocation in order >>>> + * to improve FIFO usage and throughput, while still allowing >>>> + * us to enable as many endpoints as possible. >>>> + * >>>> + * Keep in mind that this operation will be highly dependent >>>> + * on the configured size for RAM1 - which contains TxFifo -, >>>> + * the amount of endpoints enabled on coreConsultant tool, and >>>> + * the width of the Master Bus. >>>> + * >>>> + * In general, FIFO depths are represented with the following equation: >>>> + * >>>> + * fifo_size = mult * ((max_packet + mdwidth)/mdwidth + 1) + 1 >>>> + * >>>> + * Conversions can be done to the equation to derive the number of packets that >>>> + * will fit to a particular FIFO size value. >>>> + */ >>>> +static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep) >>>> +{ >>>> + struct dwc3 *dwc = dep->dwc; >>>> + int fifo_0_start; >>>> + int ram1_depth; >>>> + int fifo_size; >>>> + int min_depth; >>>> + int num_in_ep; >>>> + int remaining; >>>> + int mult = 1; >>>> + int fifo; >>>> + int tmp; >>>> + >>>> + if (!dwc->needs_fifo_resize) >>>> + return 0; >>> Maybe add a condition to check for Enhanced SuperSpeed only? >>> >> Since this logic applies for isoc endpoints as well in high speed mode, >> for high bandwidth use cases, we can't limit it to SS only. > > Ok. I was asking because you use 1024 as MPS in your calculation. > >> >>>> + >>>> + /* resize IN endpoints except ep0 */ >>>> + if (!usb_endpoint_dir_in(dep->endpoint.desc) || dep->number <= 1) >>>> + return 0; >>>> + >>>> + ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7); >>>> + >>>> + if ((dep->endpoint.maxburst > 1 && >>>> + usb_endpoint_xfer_bulk(dep->endpoint.desc)) || >>>> + usb_endpoint_xfer_isoc(dep->endpoint.desc)) >>>> + mult = 3; >>>> + >>>> + if (dep->endpoint.maxburst > 6 && >>>> + usb_endpoint_xfer_bulk(dep->endpoint.desc) && DWC3_IP_IS(DWC31)) >>>> + mult = 6; >>> You checked maxburst > 1 for isoc, but not when maxburst > 6. Why? >>> Also, "mult" is the term we usually use for isoc endpoints. Applying it >>> to bulk is confusing here. >>> >> Ok, let me rename it to something else that makes more sense. The isoc >> endpoint check was targeted for mainly high-speed high bandwidth isoc >> use cases, and I don't believe our results improved with a larger fifo >> allocation, ie 6. (refer to below) > > It should improve for isoc and reduce missed isoc error also. Most > applications only use 1-16KB max per interval, so you don't see the > impact as much. > >> >>> How did we decide on 3 and 6? Are they arbitrary? >>> >> So actually in the databook, they have some recommendations for TXFIFO >> sizes to use in "Table 4-3 Device Config Parameters." It mentions that >> for burst capable endpoints to have a fifo size to fit at least 3 >> packets of maxpacket size. >> >> There's also "Chapter 3 Cache, FIFO RAMs, and Bandwidth Requirements," >> which goes over a lot of optimizations that could be done based off your >> system's overall latency. The sizes were chosen after we ran our peak >> throughput and performance testing on our devices, and these values >> netted the best throughput while also allowing enough fifo space for our >> other USB endpoints. Also, there's going to be a limit on how much >> improvement you see with respects to increasing the fifo size, since >> your system will eventually be able to pull data out of the internal >> fifo faster than it is being filled. > > I added a patch a while back to check the max packet limit based on the > recommended minimum Rx/TxFIFO size > d94ea5319813 ("usb: dwc3: gadget: Properly set maxpacket limit") > > The driver wouldn't match the endpoint if it doesn't meet the minimum > requirement of 3 MPS if the device is operating in SuperSpeed or > SuperSpeed Plus. Was the check for "3" necessary? > Hi Thinh, Please correct me if I'm wrong, but when a function driver requests for an endpoint using usb_ep_autoconfig(), most of the time it passes in a FS descriptor that doesn't have the wMaxPacketSize parameter set (0), which means it will probably fetch the first unused endpoint. (at least this is what I noticed for the BULK ep cases) In this case, it would be hard to determine if the endpoint selected would already have the constraint you added factored in. > "6" is your tested value right? It may be different for different setup. > Can we pass this as a setting parameter from the devicetree property? > Definitely can agree to that. We can make things a bit more flexible depending on the system. Maybe create a new property, and default it to 6 if "do_fifo_resize" is set, but this parameter is not defined. Thanks Wesley Cheng >>>> + >>>> + /* FIFO size for a single buffer */ >>>> + fifo = dwc3_gadget_calc_tx_fifo_size(dwc, 1); >>>> + >>>> + /* Calculate the number of remaining EPs w/o any FIFO */ >>>> + num_in_ep = dwc->max_cfg_eps; >>>> + num_in_ep -= dwc->num_ep_resized; >>>> + >>>> + /* Reserve at least one FIFO for the number of IN EPs */ >>>> + min_depth = num_in_ep * (fifo + 1); >>>> + remaining = ram1_depth - min_depth - dwc->last_fifo_depth; >>> Can "remaining" be a negative value? If so, I think it's clearer if you do >>> remaining = max_t(int, 0, remaining); >>> >> Sure. >> >>>> + >>>> + /* >>>> + * We've already reserved 1 FIFO per EP, so check what we can fit in >>>> + * addition to it. If there is not enough remaining space, allocate >>>> + * all the remaining space to the EP. >>>> + */ >>>> + fifo_size = (mult - 1) * fifo; >>>> + if (remaining < fifo_size) { >>>> + if (remaining > 0) >>>> + fifo_size = remaining; >>>> + else >>>> + fifo_size = 0; >>> Then use this condition instead: >>> >>> if (remaining < fifo_size) >>> fifo_size = remaining; >>> >>>> + } >>>> + >>>> + fifo_size += fifo; >>>> + /* Last increment according to the TX FIFO size equation */ >>>> + fifo_size++; >>>> + >>>> + /* Check if TXFIFOs start at non-zero addr */ >>>> + tmp = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0)); >>>> + fifo_0_start = DWC3_GTXFIFOSIZ_TXFSTADDR(tmp); >>>> + >>>> + fifo_size |= (fifo_0_start + (dwc->last_fifo_depth << 16)); >>>> + if (DWC3_IP_IS(DWC31)) >>>> + dwc->last_fifo_depth += DWC31_GTXFIFOSIZ_TXFDEP(fifo_size); >>>> + else >>>> + dwc->last_fifo_depth += DWC3_GTXFIFOSIZ_TXFDEP(fifo_size); >>> Take account of DWC32. >>> >> Got it. >>>> + >>>> + /* Check fifo size allocation doesn't exceed available RAM size. */ >>>> + if (dwc->last_fifo_depth >= ram1_depth) { >>>> + dev_err(dwc->dev, "Fifosize(%d) > RAM size(%d) %s depth:%d\n", >>>> + dwc->last_fifo_depth, ram1_depth, >>>> + dep->endpoint.name, fifo_size); >>>> + if (DWC3_IP_IS(DWC31)) >>>> + fifo_size = DWC31_GTXFIFOSIZ_TXFDEP(fifo_size); >>>> + else >>>> + fifo_size = DWC3_GTXFIFOSIZ_TXFDEP(fifo_size); >>> Same here. >>> >>>> + dwc->last_fifo_depth -= fifo_size; >>>> + return -ENOMEM; >>>> + } >>>> + >>>> + dwc3_writel(dwc->regs, DWC3_GTXFIFOSIZ(dep->number >> 1), fifo_size); >>>> + dwc->num_ep_resized++; >>>> + >>>> + return 0; >>>> +} >>>> + >>>> /** >>>> * __dwc3_gadget_ep_enable - initializes a hw endpoint >>>> * @dep: endpoint to be initialized >>>> @@ -632,6 +787,10 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action) >>>> int ret; >>>> >>>> if (!(dep->flags & DWC3_EP_ENABLED)) { >>>> + ret = dwc3_gadget_resize_tx_fifos(dep); >>>> + if (ret) >>>> + return ret; >>>> + >>>> ret = dwc3_gadget_start_config(dep); >>>> if (ret) >>>> return ret; >>>> @@ -2418,6 +2577,7 @@ static int dwc3_gadget_stop(struct usb_gadget *g) >>>> >>>> spin_lock_irqsave(&dwc->lock, flags); >>>> dwc->gadget_driver = NULL; >>>> + dwc->max_cfg_eps = 0; >>>> spin_unlock_irqrestore(&dwc->lock, flags); >>>> >>>> free_irq(dwc->irq_gadget, dwc->ev_buf); >>>> @@ -2485,6 +2645,39 @@ static int dwc3_gadget_vbus_draw(struct usb_gadget *g, unsigned int mA) >>>> return 0; >>>> } >>>> >>>> +static int dwc3_gadget_check_config(struct usb_gadget *g, unsigned long ep_map) >>> What's in ep_map? Can you document more to help with the review? >>> >> Yeah, I will add some more comments. Just to explain here briefly, this >> check config callback is to address the concern pointed out by Felipe >> where we could run out of txfifo memory while our USB composition is >> being enabled. This would lead to an enumerated device, with >> non-functioning endpoints. >> >> The check config will be called during the function driver bind stage >> (before we enumerate), and ep_map carries the number of endpoints (both >> in and out) being used in a particular configuration. With this >> information, the logic will ensure that there is enough txfifo space for >> at least 1 fifo per endpoint. If not, we can catch the failure at the >> composition bind stages. (although it would be odd to see a DWC3 >> controller with not enough txfifo ram for 1 fifo per ep) The point at >> which we actually resize the fifo allocations, will always check to make >> sure that there is enough fifo space for the remaining endpoints after >> every resize. >> >> Thanks >> Wesley Cheng > > Thanks for the info. I'll review more after you add more detail. > > BR, > Thinh > >>> Thanks, >>> Thinh >>> >>>> +{ >>>> + struct dwc3 *dwc = gadget_to_dwc(g); >>>> + unsigned long in_ep_map; >>>> + int fifo_size = 0; >>>> + int ram1_depth; >>>> + int ep_num; >>>> + >>>> + if (!dwc->needs_fifo_resize) >>>> + return 0; >>>> + >>>> + /* Only interested in the IN endpoints */ >>>> + in_ep_map = ep_map >> 16; >>>> + ep_num = hweight_long(in_ep_map); >>>> + >>>> + if (ep_num <= dwc->max_cfg_eps) >>>> + return 0; >>>> + >>>> + /* Update the max number of eps in the composition */ >>>> + dwc->max_cfg_eps = ep_num; >>>> + >>>> + fifo_size = dwc3_gadget_calc_tx_fifo_size(dwc, dwc->max_cfg_eps); >>>> + /* Based on the equation, increment by one for every ep */ >>>> + fifo_size += dwc->max_cfg_eps; >>>> + >>>> + /* Check if we can fit a single fifo per endpoint */ >>>> + ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7); >>>> + if (fifo_size > ram1_depth) >>>> + return -ENOMEM; >>>> + >>>> + return 0; >>>> +} >>>> + >>>> static const struct usb_gadget_ops dwc3_gadget_ops = { >>>> .get_frame = dwc3_gadget_get_frame, >>>> .wakeup = dwc3_gadget_wakeup, >>>> @@ -2495,6 +2688,7 @@ static const struct usb_gadget_ops dwc3_gadget_ops = { >>>> .udc_set_speed = dwc3_gadget_set_speed, >>>> .get_config_params = dwc3_gadget_config_params, >>>> .vbus_draw = dwc3_gadget_vbus_draw, >>>> + .check_config = dwc3_gadget_check_config, >>>> }; >>>> >>>> /* -------------------------------------------------------------------------- */ > -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project