On Thu, Feb 28, 2019 at 12:09 AM Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > On Wed, Feb 27, 2019 at 03:19:17PM -0700, Daniel Kurtz wrote: > > In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with > > a spin lock held (and local interrupts disabled) with a huge 5 second > > timeout. This can translates to 5 million calls to udelay(1). By its > > very nature, udelay() is not meant to be precise, it only guarantees to > > delay a minimum of 1 microsecond. Therefore the actual delay of > > xhci_handshake() can be significantly longer. If the average udelay(1) > > is greater than 2.2 us, the total time in xhci_handshake() - with > > interrupts disabled can be > 11 seconds triggering the kernel's soft lockup > > detector. > > > > To avoid this, let's replace the open coded io polling loop with one from > > iopoll.h that uses a loop timed with the more presumably reliable ktime > > infrastructure. > > > > Signed-off-by: Daniel Kurtz <djkurtz@xxxxxxxxxxxx> > > Looks sane to me, nice fixup. > > Reviewed-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > > Is this causing problems on older kernels/devices today such that we > should backport this? We detected that xhci_handshake timing out can lead to softlockup while debugging a USB issue on a new product. The xhci_handshake timeout itself is a symptom of another underlying problem causing some commands to be aborted. I don't know if any such underlying problems exist on other older devices, but the potential is there so a backport is reasonable. Although, it may just shift the symptom of an underlying problem from a softlockup/oops to some other symptom, like USB just being dead. -Dan > > thanks, > > greg k-h