On 8/21/2024 6:05 AM, Thinh Nguyen wrote:
On Tue, Aug 20, 2024, Zhongqiu Han wrote:
Busy loops that poll on a register should call cpu_relax(). On some
architectures, it can lower CPU power consumption or yield to a
hyperthreaded twin processor. It also serves as a compiler barrier,
see Documentation/process/volatile-considered-harmful.rst. In addition,
if something goes wrong in the busy loop at least it can prevent things
from getting worse.
Signed-off-by: Zhongqiu Han <quic_zhonhan@xxxxxxxxxxx>
---
drivers/usb/dwc3/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 734de2a8bd21..498f08dbbdb5 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -2050,6 +2050,8 @@ static int dwc3_get_num_ports(struct dwc3 *dwc)
if (!offset)
break;
+ cpu_relax();
+
val = readl(base + offset);
major_revision = XHCI_EXT_PORT_MAJOR(val);
--
2.25.1
We're not polling on a register here. We're just traversing and reading
the next port capability. The loop in dwc3_get_num_ports() should not be
more than DWC3_USB2_MAX_PORTS + DWC3_USB3_MAX_PORTS.
Hi Thinh,
Thanks a lot for the review~
yes, now i know that the iterations are limited so it wouldn't make
sense to relax here. I will be careful about this next time and sorry
for this.
What's really causing this busy loop you found?
actually no practical issue.
If polling for a register is really a problem, then we would have that
problem everywhere else in dwc3. But why here?
I also think that polling for a register is not a problem, but if there
polling for a register in the potential infinite loop, It's better to
relax the cpu and as i saw, basically there are two types of
implementations in other codes for the relax cpu target,
1. use (u/m)sleep or (u/n)delay function or the iterations limited,
such as:
(1)
core.c- if (DWC3_VER_IS_WITHIN(DWC31, 190A, ANY) || DWC3_IP_IS(DWC32))
core.c- retries = 10;
core.c-
core.c- do {
core.c: reg = dwc3_readl(dwc->regs, DWC3_DCTL);
core.c- if (!(reg & DWC3_DCTL_CSFTRST))
core.c- goto done;
core.c-
core.c- if (DWC3_VER_IS_WITHIN(DWC31, 190A, ANY) ||
DWC3_IP_IS(DWC32))
core.c- msleep(20);
core.c- else
core.c- udelay(1);
core.c- } while (--retries);
(2)
gadget.c- /* poll until Link State changes to ON */
gadget.c- retries = 20000;
gadget.c-
gadget.c- while (retries--) {
gadget.c: reg = dwc3_readl(dwc->regs, DWC3_DSTS);
gadget.c-
gadget.c- /* in HS, means ON */
gadget.c- if (DWC3_DSTS_USBLNKST(reg) ==
DWC3_LINK_STATE_U0)
gadget.c- break;
gadget.c- }
By the way, for (2) case, the retries is 20000, seems the value is large
without relax if break the loop only while retries is 0, but as we know
although if there need delay/relax, we cannot easily use (u/m)delay or m
u(sleep) functions because we need consider to avoid the "scheduling on
atomic/invalid context" BUG. Just shared my guess, unless there is
optimization comparison data after relax cpu or practical issue here.
2. use cpu_relax() to relax for busy loop, such as:
(1)
ulpi.c-static int dwc3_ulpi_busyloop(struct dwc3 *dwc, u8 addr, bool read)
..........................
ulpi.c-
ulpi.c- while (count--) {
ulpi.c- ndelay(ns);
ulpi.c: reg = dwc3_readl(dwc->regs, DWC3_GUSB2PHYACC(0));
ulpi.c- if (reg & DWC3_GUSB2PHYACC_DONE)
ulpi.c- return 0;
ulpi.c- cpu_relax();
ulpi.c- }
(2)
host/ohci-pxa27x.c: while (__raw_readl(pxa_ohci->mmio_base + UHCHR)
& UHCHR_FSBIR)
host/ohci-pxa27x.c- cpu_relax();
(3)
gadget/udc/fsl_udc_core.c: while (fsl_readl(&dr_regs->usbcmd) &
USB_CMD_CTRL_RESET) {
gadget/udc/fsl_udc_core.c- if (time_after(jiffies, timeout)) {
gadget/udc/fsl_udc_core.c-
dev_err(&udc->gadget.dev, "udc reset timeout!\n");
gadget/udc/fsl_udc_core.c- return -ETIMEDOUT;
gadget/udc/fsl_udc_core.c- }
gadget/udc/fsl_udc_core.c- cpu_relax();
gadget/udc/fsl_udc_core.c- }
Anyways, for current patch there the iterations are limited, thanks a
lot for the review and the discussion and i will be careful next time.
Thanks,
Thinh
--
Thx and BRs,
Zhongqiu Han