________________________________________ From: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> Sent: Wednesday, 21 July 2021 15:52 To: Zhang, Qiang Cc: stern@xxxxxxxxxxxxxxxxxxx; dvyukov@xxxxxxxxxx; paulmck@xxxxxxxxxx; dpenkler@xxxxxxxxx; guido.kiener@xxxxxxxxxxxxxxxxx; linux-usb@xxxxxxxxxxxxxxx Subject: Re: [PATCH] USB: usbtmc: Fix RCU stall warning [Please note: This e-mail is from an EXTERNAL e-mail address] On Wed, Jul 21, 2021 at 07:30:39AM +0000, Zhang, Qiang wrote: > > > ________________________________________ > From: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> > Sent: Wednesday, 21 July 2021 15:08 > To: Zhang, Qiang > Cc: stern@xxxxxxxxxxxxxxxxxxx; dvyukov@xxxxxxxxxx; paulmck@xxxxxxxxxx; dpenkler@xxxxxxxxx; guido.kiener@xxxxxxxxxxxxxxxxx; linux-usb@xxxxxxxxxxxxxxx > Subject: Re: [PATCH] USB: usbtmc: Fix RCU stall warning > > [Please note: This e-mail is from an EXTERNAL e-mail address] > > On Tue, Jun 29, 2021 at 11:32:36AM +0800, qiang.zhang@xxxxxxxxxxxxx wrote: > > From: Zqiang <qiang.zhang@xxxxxxxxxxxxx> > > >I need a "full" name here, and in the signed-off-by line please. > > > > > rcu: INFO: rcu_preempt self-detected stall on CPU > > rcu: 1-...!: (2 ticks this GP) idle=d92/1/0x4000000000000000 > > softirq=25390/25392 fqs=3 > > (t=12164 jiffies g=31645 q=43226) > > rcu: rcu_preempt kthread starved for 12162 jiffies! g31645 f0x0 > > RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > > rcu: Unless rcu_preempt kthread gets sufficient CPU time, > > OOM is now expected behavior. > > rcu: RCU grace-period kthread stack dump: > > task:rcu_preempt state:R running task > > > > In the case of system use dummy_hcd as usb controller, when the > > usbtmc devices is disconnected, in usbtmc_interrupt(), if the urb > > status is unknown, the urb will be resubmit, the urb may be insert > > to dum_hcd->urbp_list again, this will cause the dummy_timer() not > > to exit for a long time, beacause the dummy_timer() be called in > > softirq and local_bh is disable, this not only causes the RCU reading > > critical area to consume too much time but also makes the tasks in > > the current CPU runq not run in time, and that triggered RCU stall. > > > > return directly when find the urb status is not zero to fix it. > > > > Reported-by: syzbot+e2eae5639e7203360018@xxxxxxxxxxxxxxxxxxxxxxxxx > > Signed-off-by: Zqiang <qiang.zhang@xxxxxxxxxxxxx> > > >What commit does this fix? Does it need to go to stable kernels? > > I will add fix tags resend, need to go to stable kernel > > > > >What about the usbtmc maintainers, what do they think about this? > > Alan Stern reviewed this change before. > >I do not see that on this commit :( Sorry, I used the wrong words, Alan Stern made suggestions for my patch. The content is as follows : On Mon, Jun 28, 2021 at 06:38:37AM +0000, Zhang, Qiang wrote: > > > ________________________________________ > From: Dmitry Vyukov <dvyukov@xxxxxxxxxx> > Sent: Monday, 19 April 2021 15:27 > To: syzbot; Greg Kroah-Hartman; guido.kiener@xxxxxxxxxxxxxxxxx; dpenkler@xxxxxxxxx; lee.jones@xxxxxxxxxx; USB list > Cc: bp@xxxxxxxxx; dwmw@xxxxxxxxxxxx; hpa@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; luto@xxxxxxxxxx; mingo@xxxxxxxxxx; syzkaller-bugs@xxxxxxxxxxxxxxxx; tglx@xxxxxxxxxxxxx; x86@xxxxxxxxxx > Subject: Re: [syzbot] INFO: rcu detected stall in tx > > [Please note: This e-mail is from an EXTERNAL e-mail address] > > On Mon, Apr 19, 2021 at 9:19 AM syzbot > <syzbot+e2eae5639e7203360018@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 50987bec Merge tag 'trace-v5.12-rc7' of git://git.kernel.o.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=1065c5fcd00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=398c4d0fe6f66e68 > > dashboard link: https://syzkaller.appspot.com/bug?extid=e2eae5639e7203360018 > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+e2eae5639e7203360018@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > usbtmc 5-1:0.0: unknown status received: -71 > > usbtmc 3-1:0.0: unknown status received: -71 > > usbtmc 5-1:0.0: unknown status received: -71 > > >The log shows an infinite stream of these before the stall, so I > >assume it's an infinite loop in usbtmc. > >+usbtmc maintainers > > > >[ 370.171634][ C0] usbtmc 6-1:0.0: unknown status received: >-71 > >[ 370.177799][ C1] usbtmc 3-1:0.0: unknown status received: >-71 > This seems like a long time in the following cycle, when the callback function usbtmc_interrupt() find unknown status error, it will submit urb again. the urb may be insert urbp_list. > due to the dummy_timer() be called in bh-disable. > This will result in the RCU reading critical area not exiting for a long time (note: bh_disable/enable, preempt_disable/enable is regarded as the RCU critical reading area ), and prevent rcu_preempt kthread be schedule and running. > Whether to return directly when we find the urb status is unknown error? Yes. > diff --git a/drivers/usb/class/usbtmc.c b/drivers/usb/class/usbtmc.c > index 74d5a9c5238a..39d44339c03f 100644 > --- a/drivers/usb/class/usbtmc.c > +++ b/drivers/usb/class/usbtmc.c > @@ -2335,6 +2335,7 @@ static void usbtmc_interrupt(struct urb *urb) > return; > default: > dev_err(dev, "unknown status received: %d\n", status); > + return; > } > exit: > rv = usb_submit_urb(urb, GFP_ATOMIC); This is the right thing to do. In fact, you should also change the code above this. There's no real need for special handling of the -ECONNRESET, -ENOENT, ..., -EPIPE codes, since the driver will do the same thing no matter what the code is. Alan Stern