________________________________________ From: Guido Kiener <Guido.Kiener@xxxxxxxxxxxxxxxxx> Sent: Friday, 23 July 2021 01:33 To: Greg KH; dave penkler Cc: Zhang, Qiang; Alan Stern; Dmitry Vyukov; paulmck@xxxxxxxxxx; USB Subject: Re: [PATCH] USB: usbtmc: Fix RCU stall warning [Please note: This e-mail is from an EXTERNAL e-mail address] > From: Greg KH > Sent: Wednesday, July 21, 2021 11:48 AM > Subject: *EXT* Re: [PATCH] USB: usbtmc: Fix RCU stall warning > > On Wed, Jul 21, 2021 at 11:44:23AM +0200, dave penkler wrote: > > On Wed, 21 Jul 2021 at 09:52, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > On Wed, Jul 21, 2021 at 09:41:15AM +0200, dave penkler wrote: > > > > On Wed, 21 Jul 2021 at 09:08, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> > wrote: > > > > > > > > > > On Tue, Jun 29, 2021 at 11:32:36AM +0800, qiang.zhang@xxxxxxxxxxxxx > wrote: > > > > > > From: Zqiang <qiang.zhang@xxxxxxxxxxxxx> > > > > > > > > > > I need a "full" name here, and in the signed-off-by line please. > > > > > > > > > > > > > > > > > rcu: INFO: rcu_preempt self-detected stall on CPU > > > > > > rcu: 1-...!: (2 ticks this GP) idle=d92/1/0x4000000000000000 > > > > > > softirq=25390/25392 fqs=3 > > > > > > (t=12164 jiffies g=31645 q=43226) > > > > > > rcu: rcu_preempt kthread starved for 12162 jiffies! g31645 f0x0 > > > > > > RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > > > > > > rcu: Unless rcu_preempt kthread gets sufficient CPU time, > > > > > > OOM is now expected behavior. > > > > > > rcu: RCU grace-period kthread stack dump: > > > > > > task:rcu_preempt state:R running task > > > > > > > > > > > > In the case of system use dummy_hcd as usb controller, when > > > > > > the usbtmc devices is disconnected, in usbtmc_interrupt(), if > > > > > > the urb status is unknown, the urb will be resubmit, the urb > > > > > > may be insert to dum_hcd->urbp_list again, this will cause the > > > > > > dummy_timer() not to exit for a long time, beacause the > > > > > > dummy_timer() be called in softirq and local_bh is disable, > > > > > > this not only causes the RCU reading critical area to consume > > > > > > too much time but also makes the tasks in the current CPU runq not run > in time, and that triggered RCU stall. > > > > > > > > > > > > return directly when find the urb status is not zero to fix it. > > > > > > > > > > > > Reported-by: > > > > > > syzbot+e2eae5639e7203360018@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > Signed-off-by: Zqiang <qiang.zhang@xxxxxxxxxxxxx> > > > > > > > > > > What commit does this fix? Does it need to go to stable kernels? > > > > > > > > > > What about the usbtmc maintainers, what do they think about this? > > > > > > > > This patch makes the babbling endpoint retry/recovery code in the > > > > real world usb host controller drivers redundant and would prevent > > > > usbtmc applications from benefiting from it. > > > > > > I do not understand, is this change ok or not? > > > > > > Why do usbtmc applications need to know if babbling happens or not? > > > > > > confused, > > Sorry, the issue this patch is trying to fix occurs because the > > current usbtmc driver resubmits the URB when it gets an EPROTO return. > > The dummy usb host controller driver used in the syzbot tests keeps > > returning the resubmitted URB with EPROTO causing a loop that starves > > RCU. With an actual HCI driver it either recovers or returns an EPIPE. > > In either case no loop occurs. So for my part as a user and maintainer > > this patch is not ok. > > Thanks for the review. > > Zqiang, can you fix this patch up based on this please? > > thanks, > > greg k-h >Qiang, > >After discussions with Alan and Dave we think that fixing the >usbtmc driver is the best approach to fix the RCU stall warning. >Your first proposal was almost ok, but I think we should use >dev_dbg() instead of dev_err() to avoid printing the EPROTO >errors. See below: > >Please feel free to add the following text to your patch >description. > >-Guido > > >The function usbtmc_interrupt() resubmits urbs when the error >status >of an urb is -EPROTO. In systems using the dummy_hcd usb >controller >this can result in endless interrupt loops when the usbtmc device >is >disconnected from the host system. > >Since host controller drivers already try to recover from >transmission >errors, there is no need to resubmit the urb or try other solutions >to repair the error situation. > >In case of errors the INT pipe just stops to wait for further packets. > >Reviewed-by: Guido Kiener <guido.kiener@xxxxxxxxxxxxxxxxx> > >diff --git a/drivers/usb/class/usbtmc.c b/drivers/usb/class>/usbtmc.c >index 74d5a9c5238a..73f419adce61 100644 >--- a/drivers/usb/class/usbtmc.c >+++ b/drivers/usb/class/usbtmc.c >@@ -2324,17 +2324,10 @@ static void usbtmc_interrupt(struct >urb *urb) > dev_err(dev, "overflow with length %d, actual length is >%d\n", > data->iin_wMaxPacketSize, urb->actual_length); > fallthrough; >- case -ECONNRESET: >- case -ENOENT: >- case -ESHUTDOWN: >- case -EILSEQ: >- case -ETIME: >- case -EPIPE: >+ default: > /* urb terminated, clean up */ > dev_dbg(dev, "urb terminated, status: %d\n", status); > return; >- default: >- dev_err(dev, "unknown status received: %d\n", status); > } > exit: > rv = usb_submit_urb(urb, GFP_ATOMIC); > Thanks I will resend v2 Qiang