Re: Inquiry about the f_tcm: Enhance UASP driver work

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Nov 23, 2024, Michał Pecio wrote:
> On Sat, 23 Nov 2024 00:02:10 +0000, Thinh Nguyen wrote:
> > > Long delays I have seen mainly on some unfortunate pairings of HC
> > > and device (HW bugs?) which trigger unusual error conditions poorly
> > > handled by xhci_hcd. Try with dynamic debug on
> > > handle_transferless_tx_event(), if your kernel is recent enough for
> > > that to be a separate function.  
> > 
> > No, this delay is not a HW bug. When there's transaction error, the
> > xHCI driver will reset the endpoint. The packet sequence number is
> > reset and out of sync with the device. The next packet cannot proceed
> > until there's some sort of recovery. There's no usb_clear_halt() or
> > port reset immediately after a -EPROTO. The only recovery (port
> > reset) will happen is after a timeout.
> 
> I think you are right. I tried to repro and I got this:
> 
> [Nov23 14:01] xhci-pci-renesas 0000:03:00.0: Transfer error for slot 1 ep 6 on endpoint
> [  +0.000380] xhci-pci-renesas 0000:03:00.0: Transfer error for slot 1 ep 6 on endpoint
> [ +30.096820] sd 6:0:0:0: [sdb] tag#1 uas_eh_abort_handler 0 uas-tag 2 inflight: IN 
> [  +0.000006] sd 6:0:0:0: [sdb] tag#1 CDB: opcode=0x28 28 00 02 d0 30 08 00 02 00 00
> [  +0.012009] scsi host6: uas_eh_device_reset_handler start
> [  +0.114634] usb 13-2: reset SuperSpeed USB device number 6 using xhci-pci-renesas
> [  +0.017603] scsi host6: uas_eh_device_reset_handler success
> [  +0.000072] sd 6:0:0:0: [sdb] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=30s
> [  +0.000003] sd 6:0:0:0: [sdb] tag#1 CDB: opcode=0x28 28 00 02 d0 30 08 00 02 00 00
> [  +0.000001] I/O error, dev sdb, sector 47198216 op 0x0:(READ) flags 0x80700 phys_seg 64 prio class 0
> 
> I will keep it running for a few more hours and if those timeouts
> keep happening I will have to conclude that I remembered wrong.
> 
> > > before resetting, but the whole endpoint is stopped and nothing
> > > moves forward. At least that's the impression I got, I was looking
> > > at other things.
> 
> But a completely stopped endpoint is *also* possible if you encounter
> COMP_INVALID_STREAM_ID. I see it after some command errors on this chip:
> 
> 13fd:5910 Initio Corporation

Sure. I guess I always associate the -EPROTO error with transaction
error that I forget there are some other error conditions with that
error code also.

> 
> > Perhaps this can be enhanced in the future in the storage class
> > driver regarding -EPROTO recovery.
> 
> It's a universal problem with xhci_hcd, it always resets the host
> sequence state on every error, which is against Linux convention,
> so nobody expects it and nobody handles it. It's nuts.
> 
> One thing I'm going to try is patch it to stop doing this and see
> what happens.
> 

The xHC can halt the endpoint when it sees certain error. That's what
happens here. To recover from this halted state, the driver can either
try to soft reset to resume or give up and reset the endpoint and return
-EPROTO. We can't avoid the reset here. Perhaps we can associate this
type of error as -EPIPE instead. If not, we should update the error code
documentation under Documentation/driver-api/usb/error-codes.rst

BR,
Thinh




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux