On 2023/7/21 22:57, Alan Stern Wrote: > On Fri, Jul 21, 2023 at 06:00:15PM +0800, liulongfang wrote: >> On systems that use ECC memory. The ECC error of the memory will >> cause the USB controller to halt. It causes the usb_control_msg() >> operation to fail. > > How often does this happen in real life? (Besides, it seems to me that > if your system is getting a bunch of ECC memory errors then you've got > much worse problems than a simple USB failure!) > This problem is on ECC memory platform. In the test scenario, the problem is 100% reproducible. > And why do you worry about ECC memory failures in particular? Can't > _any_ kind of failure cause the usb_control_msg() operation to fail? > >> At this point, the returned buffer data is an abnormal value, and >> continuing to use it will lead to incorrect results. > > The driver already contains code to check for abnormal values. The > check is not perfect, but it should prevent things from going too > badly wrong. > If it is ECC memory error. These parameter checks would also actually be invalid. >> Therefore, it is necessary to judge the return value and exit. >> >> Signed-off-by: liulongfang <liulongfang@xxxxxxxxxx> > > There is a flaw in your reasoning. > > The operation carried out here is deliberately unsafe (for full-speed > devices). It is made before we know the actual maxpacket size for ep0, > and as a result it might return an error code even when it works okay. > This shouldn't happen, but a lot of USB hardware is unreliable. > > Therefore we must not ignore the result merely because r < 0. If we do > that, the kernel might stop working with some devices. > It may be that the handling solution for ECC errors is different from that of the OS platform. On the test platform, after usb_control_msg() fails, reading the memory data of buf will directly lead to kernel crash: [ T14] Call trace: [ T14] hub_port_init+0x280/0x9f0 [ T14] hub_port_connect+0x1d4/0xa40 [ T14] hub_port_connect_change+0xb8/0x2b0 [ T14] port_event+0x430/0x5d0 [ T14] hub_event+0x138/0x4a0 [ T14] process_one_work+0x1c8/0x39c [ T14] worker_thread+0x150/0x3d0 [ T14] kthread+0xfc/0x130 [ T14] ret_from_fork+0x10/0x18 [ T14] Code: 528000c2 b9007fea 94002c9a b9407fea (39401f41) thanks, Longfang. > Alan Stern > >> --- >> drivers/usb/core/hub.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c >> index a739403a9e45..6a43198be263 100644 >> --- a/drivers/usb/core/hub.c >> +++ b/drivers/usb/core/hub.c >> @@ -4891,6 +4891,16 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, >> USB_DT_DEVICE << 8, 0, >> buf, GET_DESCRIPTOR_BUFSIZE, >> initial_descriptor_timeout); >> + /* On systems that use ECC memory, ECC errors can >> + * cause the USB controller to halt. >> + * It causes this operation to fail. At this time, >> + * the buf data is an abnormal value and needs to be exited. >> + */ >> + if (r < 0) { >> + kfree(buf); >> + goto fail; >> + } >> + >> switch (buf->bMaxPacketSize0) { >> case 8: case 16: case 32: case 64: case 255: >> if (buf->bDescriptorType == >> -- >> 2.24.0 >> > > . >