Re: [PATCH v6 3/4] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Wesley Cheng wrote:
>
> On 1/22/2021 4:15 PM, Thinh Nguyen wrote:
>> Hi,
>>
>> Wesley Cheng wrote:
>>> Some devices have USB compositions which may require multiple endpoints
>>> that support EP bursting.  HW defined TX FIFO sizes may not always be
>>> sufficient for these compositions.  By utilizing flexible TX FIFO
>>> allocation, this allows for endpoints to request the required FIFO depth to
>>> achieve higher bandwidth.  With some higher bMaxBurst configurations, using
>>> a larger TX FIFO size results in better TX throughput.
>>>
>>> By introducing the check_config() callback, the resizing logic can fetch
>>> the maximum number of endpoints used in the USB composition (can contain
>>> multiple configurations), which helps ensure that the resizing logic can
>>> fulfill the configuration(s), or return an error to the gadget layer
>>> otherwise during bind time.
>>>
>>> Signed-off-by: Wesley Cheng <wcheng@xxxxxxxxxxxxxx>
>>> ---
>>>  drivers/usb/dwc3/core.c   |   2 +
>>>  drivers/usb/dwc3/core.h   |   8 ++
>>>  drivers/usb/dwc3/ep0.c    |   2 +
>>>  drivers/usb/dwc3/gadget.c | 194 ++++++++++++++++++++++++++++++++++++++++++++++
>>>  4 files changed, 206 insertions(+)
>>>
>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
>>> index 6969196..e7fa6af 100644
>>> --- a/drivers/usb/dwc3/core.c
>>> +++ b/drivers/usb/dwc3/core.c
>>> @@ -1284,6 +1284,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
>>>  				&tx_thr_num_pkt_prd);
>>>  	device_property_read_u8(dev, "snps,tx-max-burst-prd",
>>>  				&tx_max_burst_prd);
>>> +	dwc->needs_fifo_resize = device_property_read_bool(dev,
>>> +							   "tx-fifo-resize");
>>>  
>>>  	dwc->disable_scramble_quirk = device_property_read_bool(dev,
>>>  				"snps,disable_scramble_quirk");
>>> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
>>> index eec1cf4..983b2fd4 100644
>>> --- a/drivers/usb/dwc3/core.h
>>> +++ b/drivers/usb/dwc3/core.h
>>> @@ -1223,6 +1223,7 @@ struct dwc3 {
>>>  	unsigned		is_utmi_l1_suspend:1;
>>>  	unsigned		is_fpga:1;
>>>  	unsigned		pending_events:1;
>>> +	unsigned		needs_fifo_resize:1;
>> The prefix "need" sounds like a requirement, but I don't think it is the
>> case here. I think "do" would be a better prefix here.
>>
> Hi Thinh,
>
> Sure, that is true, since this may be an optional flag for certain
> platforms.
>
>>>  	unsigned		pullups_connected:1;
>>>  	unsigned		setup_packet_pending:1;
>>>  	unsigned		three_stage_setup:1;
>>> @@ -1257,6 +1258,10 @@ struct dwc3 {
>>>  	unsigned		dis_split_quirk:1;
>>>  
>>>  	u16			imod_interval;
>>> +
>>> +	int			max_cfg_eps;
>>> +	int			last_fifo_depth;
>>> +	int			num_ep_resized;
>>>  };
>> Please document these new fields.
>>
> Will do.
>
>>>  
>>>  #define INCRX_BURST_MODE 0
>>> @@ -1471,6 +1476,7 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
>>>  		struct dwc3_gadget_ep_cmd_params *params);
>>>  int dwc3_send_gadget_generic_command(struct dwc3 *dwc, unsigned int cmd,
>>>  		u32 param);
>>> +void dwc3_gadget_clear_tx_fifos(struct dwc3 *dwc);
>>>  #else
>>>  static inline int dwc3_gadget_init(struct dwc3 *dwc)
>>>  { return 0; }
>>> @@ -1490,6 +1496,8 @@ static inline int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
>>>  static inline int dwc3_send_gadget_generic_command(struct dwc3 *dwc,
>>>  		int cmd, u32 param)
>>>  { return 0; }
>>> +static inline void dwc3_gadget_clear_tx_fifos(struct dwc3 *dwc)
>>> +{ }
>>>  #endif
>>>  
>>>  #if IS_ENABLED(CONFIG_USB_DWC3_DUAL_ROLE)
>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>> index 8b668ef..4f216bd 100644
>>> --- a/drivers/usb/dwc3/ep0.c
>>> +++ b/drivers/usb/dwc3/ep0.c
>>> @@ -616,6 +616,8 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
>>>  		return -EINVAL;
>>>  
>>>  	case USB_STATE_ADDRESS:
>>> +		dwc3_gadget_clear_tx_fifos(dwc);
>>> +
>>>  		ret = dwc3_ep0_delegate_req(dwc, ctrl);
>>>  		/* if the cfg matches and the cfg is non zero */
>>>  		if (cfg && (!ret || (ret == USB_GADGET_DELAYED_STATUS))) {
>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>> index 86f257f..26f9d64 100644
>>> --- a/drivers/usb/dwc3/gadget.c
>>> +++ b/drivers/usb/dwc3/gadget.c
>>> @@ -615,6 +615,161 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
>>>  static void dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force,
>>>  		bool interrupt);
>>>  
>>> +static int dwc3_gadget_calc_tx_fifo_size(struct dwc3 *dwc, int mult)
>> Can you document what this function does?
>>
> Will do.
>
>>> +{
>>> +	int max_packet = 1024;
>> Maybe you can also document why you chose 1024 (e.g. applicable to
>> Enhanced SuperSpeed only?).
>>
> Sure.  Its basically applicable for SS and isoc (hs/ss) use cases since
> max packet size is 1024 in both cases.

Highspeed bulk MPS is 512. Fullspeed varies more (bulk max MPS is 64 and
isoc is 1023).

However, we can keep it simple with 1024 as if it's ok to over estimate.
Just need to note that here.

>
>>> +	int fifo_size;
>>> +	int mdwidth;
>>> +
>>> +	mdwidth = DWC3_MDWIDTH(dwc->hwparams.hwparams0);
>>> +	/* MDWIDTH is represented in bits, we need it in bytes */
>>> +	mdwidth >>= 3;
>> mdwidth for DWC32 requires to read hwparams6 for the upper 2 significant
>> bits. Can we add a check for DWC32 also? You can check how we're doing
>> it now in the current code.
>>
> Sure.  I'll make sure to get the correct registers for the DWC32 case.
>
>>> +
>>> +	fifo_size = mult * ((max_packet + mdwidth) / mdwidth) + 1;
>>> +	return fifo_size;
>>> +}
>>> +
>>> +void dwc3_gadget_clear_tx_fifos(struct dwc3 *dwc)
>>> +{
>>> +	struct dwc3_ep *dep;
>>> +	int fifo_depth;
>>> +	int size;
>>> +	int num;
>>> +
>>> +	if (!dwc->needs_fifo_resize)
>>> +		return;
>>> +
>>> +	/* Read ep0IN related TXFIFO size */
>>> +	dep = dwc->eps[1];
>>> +	size = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
>>> +	if (DWC3_IP_IS(DWC31))
>>> +		fifo_depth = DWC31_GTXFIFOSIZ_TXFDEP(size);
>>> +	else
>>> +		fifo_depth = DWC3_GTXFIFOSIZ_TXFDEP(size);
>> The driver handles 3 IPs. Getting the fifo depth for DWC32 is the same
>> as DWC31. So the condition should be
>> if (DWC3_IP_IS(DWC3))
>>        fifo_depth = ...
>> else
>>       fifo_depth = ...
>>
> Understood.
>
>>> +
>>> +	dwc->last_fifo_depth = fifo_depth;
>>> +	/* Clear existing TXFIFO for all IN eps except ep0 */
>>> +	for (num = 3; num < min_t(int, dwc->num_eps, DWC3_ENDPOINTS_NUM);
>>> +	     num += 2) {
>>> +		dep = dwc->eps[num];
>>> +		/* Don't change TXFRAMNUM on usb31 version */
>>> +		size = DWC3_IP_IS(DWC31) ?
>>> +			dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(num >> 1)) &
>>> +				   DWC31_GTXFIFOSIZ_TXFRAMNUM : 0;
>>> +
>> Same here. Check for DWC32.
>>
>>> +		dwc3_writel(dwc->regs, DWC3_GTXFIFOSIZ(num >> 1), size);
>>> +	}
>>> +	dwc->num_ep_resized = 0;
>>> +}
>>> +
>>> +/*
>>> + * dwc3_gadget_resize_tx_fifos - reallocate fifo spaces for current use-case
>>> + * @dwc: pointer to our context structure
>>> + *
>>> + * This function will a best effort FIFO allocation in order
>>> + * to improve FIFO usage and throughput, while still allowing
>>> + * us to enable as many endpoints as possible.
>>> + *
>>> + * Keep in mind that this operation will be highly dependent
>>> + * on the configured size for RAM1 - which contains TxFifo -,
>>> + * the amount of endpoints enabled on coreConsultant tool, and
>>> + * the width of the Master Bus.
>>> + *
>>> + * In general, FIFO depths are represented with the following equation:
>>> + *
>>> + * fifo_size = mult * ((max_packet + mdwidth)/mdwidth + 1) + 1
>>> + *
>>> + * Conversions can be done to the equation to derive the number of packets that
>>> + * will fit to a particular FIFO size value.
>>> + */
>>> +static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep)
>>> +{
>>> +	struct dwc3 *dwc = dep->dwc;
>>> +	int fifo_0_start;
>>> +	int ram1_depth;
>>> +	int fifo_size;
>>> +	int min_depth;
>>> +	int num_in_ep;
>>> +	int remaining;
>>> +	int mult = 1;
>>> +	int fifo;
>>> +	int tmp;
>>> +
>>> +	if (!dwc->needs_fifo_resize)
>>> +		return 0;
>> Maybe add a condition to check for Enhanced SuperSpeed only?
>>
> Since this logic applies for isoc endpoints as well in high speed mode,
> for high bandwidth use cases, we can't limit it to SS only.

Ok. I was asking because you use 1024 as MPS in your calculation.

>
>>> +
>>> +	/* resize IN endpoints except ep0 */
>>> +	if (!usb_endpoint_dir_in(dep->endpoint.desc) || dep->number <= 1)
>>> +		return 0;
>>> +
>>> +	ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7);
>>> +
>>> +	if ((dep->endpoint.maxburst > 1 &&
>>> +	     usb_endpoint_xfer_bulk(dep->endpoint.desc)) ||
>>> +	    usb_endpoint_xfer_isoc(dep->endpoint.desc))
>>> +		mult = 3;
>>> +
>>> +	if (dep->endpoint.maxburst > 6 &&
>>> +	    usb_endpoint_xfer_bulk(dep->endpoint.desc) && DWC3_IP_IS(DWC31))
>>> +		mult = 6;
>> You checked maxburst > 1 for isoc, but not when maxburst > 6. Why?
>> Also, "mult" is the term we usually use for isoc endpoints. Applying it
>> to bulk is confusing here.
>>
> Ok, let me rename it to something else that makes more sense.  The isoc
> endpoint check was targeted for mainly high-speed high bandwidth isoc
> use cases, and I don't believe our results improved with a larger fifo
> allocation, ie 6. (refer to below)

It should improve for isoc and reduce missed isoc error also. Most
applications only use 1-16KB max per interval, so you don't see the
impact as much.

>
>> How did we decide on 3 and 6? Are they arbitrary?
>>
> So actually in the databook, they have some recommendations for TXFIFO
> sizes to use in "Table 4-3 Device Config Parameters."  It mentions that
> for burst capable endpoints to have a fifo size to fit at least 3
> packets of maxpacket size.
>
> There's also "Chapter 3 Cache, FIFO RAMs, and Bandwidth Requirements,"
> which goes over a lot of optimizations that could be done based off your
> system's overall latency.  The sizes were chosen after we ran our peak
> throughput and performance testing on our devices, and these values
> netted the best throughput while also allowing enough fifo space for our
> other USB endpoints. Also, there's going to be a limit on how much
> improvement you see with respects to increasing the fifo size, since
> your system will eventually be able to pull data out of the internal
> fifo faster than it is being filled.

I added a patch a while back to check the max packet limit based on the
recommended minimum Rx/TxFIFO size
d94ea5319813 ("usb: dwc3: gadget: Properly set maxpacket limit")

The driver wouldn't match the endpoint if it doesn't meet the minimum
requirement of 3 MPS if the device is operating in SuperSpeed or
SuperSpeed Plus. Was the check for "3" necessary?

"6" is your tested value right? It may be different for different setup.
Can we pass this as a setting parameter from the devicetree property?

>>> +
>>> +	/* FIFO size for a single buffer */
>>> +	fifo = dwc3_gadget_calc_tx_fifo_size(dwc, 1);
>>> +
>>> +	/* Calculate the number of remaining EPs w/o any FIFO */
>>> +	num_in_ep = dwc->max_cfg_eps;
>>> +	num_in_ep -= dwc->num_ep_resized;
>>> +
>>> +	/* Reserve at least one FIFO for the number of IN EPs */
>>> +	min_depth = num_in_ep * (fifo + 1);
>>> +	remaining = ram1_depth - min_depth - dwc->last_fifo_depth;
>> Can "remaining" be a negative value? If so, I think it's clearer if you do
>> remaining = max_t(int, 0, remaining);
>>
> Sure.
>
>>> +
>>> +	/*
>>> +	 * We've already reserved 1 FIFO per EP, so check what we can fit in
>>> +	 * addition to it.  If there is not enough remaining space, allocate
>>> +	 * all the remaining space to the EP.
>>> +	 */
>>> +	fifo_size = (mult - 1) * fifo;
>>> +	if (remaining < fifo_size) {
>>> +		if (remaining > 0)
>>> +			fifo_size = remaining;
>>> +		else
>>> +			fifo_size = 0;
>> Then use this condition instead:
>>
>> if (remaining < fifo_size)
>>        fifo_size = remaining;
>>
>>> +	}
>>> +
>>> +	fifo_size += fifo;
>>> +	/* Last increment according to the TX FIFO size equation */
>>> +	fifo_size++;
>>> +
>>> +	/* Check if TXFIFOs start at non-zero addr */
>>> +	tmp = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
>>> +	fifo_0_start = DWC3_GTXFIFOSIZ_TXFSTADDR(tmp);
>>> +
>>> +	fifo_size |= (fifo_0_start + (dwc->last_fifo_depth << 16));
>>> +	if (DWC3_IP_IS(DWC31))
>>> +		dwc->last_fifo_depth += DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
>>> +	else
>>> +		dwc->last_fifo_depth += DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
>> Take account of DWC32.
>>
> Got it.
>>> +
>>> +	/* Check fifo size allocation doesn't exceed available RAM size. */
>>> +	if (dwc->last_fifo_depth >= ram1_depth) {
>>> +		dev_err(dwc->dev, "Fifosize(%d) > RAM size(%d) %s depth:%d\n",
>>> +			dwc->last_fifo_depth, ram1_depth,
>>> +			dep->endpoint.name, fifo_size);
>>> +		if (DWC3_IP_IS(DWC31))
>>> +			fifo_size = DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
>>> +		else
>>> +			fifo_size = DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
>> Same here.
>>
>>> +		dwc->last_fifo_depth -= fifo_size;
>>> +		return -ENOMEM;
>>> +	}
>>> +
>>> +	dwc3_writel(dwc->regs, DWC3_GTXFIFOSIZ(dep->number >> 1), fifo_size);
>>> +	dwc->num_ep_resized++;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  /**
>>>   * __dwc3_gadget_ep_enable - initializes a hw endpoint
>>>   * @dep: endpoint to be initialized
>>> @@ -632,6 +787,10 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action)
>>>  	int			ret;
>>>  
>>>  	if (!(dep->flags & DWC3_EP_ENABLED)) {
>>> +		ret = dwc3_gadget_resize_tx_fifos(dep);
>>> +		if (ret)
>>> +			return ret;
>>> +
>>>  		ret = dwc3_gadget_start_config(dep);
>>>  		if (ret)
>>>  			return ret;
>>> @@ -2418,6 +2577,7 @@ static int dwc3_gadget_stop(struct usb_gadget *g)
>>>  
>>>  	spin_lock_irqsave(&dwc->lock, flags);
>>>  	dwc->gadget_driver	= NULL;
>>> +	dwc->max_cfg_eps = 0;
>>>  	spin_unlock_irqrestore(&dwc->lock, flags);
>>>  
>>>  	free_irq(dwc->irq_gadget, dwc->ev_buf);
>>> @@ -2485,6 +2645,39 @@ static int dwc3_gadget_vbus_draw(struct usb_gadget *g, unsigned int mA)
>>>  	return 0;
>>>  }
>>>  
>>> +static int dwc3_gadget_check_config(struct usb_gadget *g, unsigned long ep_map)
>> What's in ep_map? Can you document more to help with the review?
>>
> Yeah, I will add some more comments.  Just to explain here briefly, this
> check config callback is to address the concern pointed out by Felipe
> where we could run out of txfifo memory while our USB composition is
> being enabled.  This would lead to an enumerated device, with
> non-functioning endpoints.
>
> The check config will be called during the function driver bind stage
> (before we enumerate), and ep_map carries the number of endpoints (both
> in and out) being used in a particular configuration.  With this
> information, the logic will ensure that there is enough txfifo space for
> at least 1 fifo per endpoint.  If not, we can catch the failure at the
> composition bind stages. (although it would be odd to see a DWC3
> controller with not enough txfifo ram for 1 fifo per ep)  The point at
> which we actually resize the fifo allocations, will always check to make
> sure that there is enough fifo space for the remaining endpoints after
> every resize.
>
> Thanks
> Wesley Cheng

Thanks for the info. I'll review more after you add more detail.

BR,
Thinh

>> Thanks,
>> Thinh
>>
>>> +{
>>> +	struct dwc3 *dwc = gadget_to_dwc(g);
>>> +	unsigned long in_ep_map;
>>> +	int fifo_size = 0;
>>> +	int ram1_depth;
>>> +	int ep_num;
>>> +
>>> +	if (!dwc->needs_fifo_resize)
>>> +		return 0;
>>> +
>>> +	/* Only interested in the IN endpoints */
>>> +	in_ep_map = ep_map >> 16;
>>> +	ep_num = hweight_long(in_ep_map);
>>> +
>>> +	if (ep_num <= dwc->max_cfg_eps)
>>> +		return 0;
>>> +
>>> +	/* Update the max number of eps in the composition */
>>> +	dwc->max_cfg_eps = ep_num;
>>> +
>>> +	fifo_size = dwc3_gadget_calc_tx_fifo_size(dwc, dwc->max_cfg_eps);
>>> +	/* Based on the equation, increment by one for every ep */
>>> +	fifo_size += dwc->max_cfg_eps;
>>> +
>>> +	/* Check if we can fit a single fifo per endpoint */
>>> +	ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7);
>>> +	if (fifo_size > ram1_depth)
>>> +		return -ENOMEM;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  static const struct usb_gadget_ops dwc3_gadget_ops = {
>>>  	.get_frame		= dwc3_gadget_get_frame,
>>>  	.wakeup			= dwc3_gadget_wakeup,
>>> @@ -2495,6 +2688,7 @@ static const struct usb_gadget_ops dwc3_gadget_ops = {
>>>  	.udc_set_speed		= dwc3_gadget_set_speed,
>>>  	.get_config_params	= dwc3_gadget_config_params,
>>>  	.vbus_draw		= dwc3_gadget_vbus_draw,
>>> +	.check_config		= dwc3_gadget_check_config,
>>>  };
>>>  
>>>  /* -------------------------------------------------------------------------- */





[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux