Re: [PATCH] USB:bugfix a controller halt error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023/7/21 22:57, Alan Stern Wrote:
> On Fri, Jul 21, 2023 at 06:00:15PM +0800, liulongfang wrote:
>> On systems that use ECC memory. The ECC error of the memory will
>> cause the USB controller to halt. It causes the usb_control_msg()
>> operation to fail.
> 
> How often does this happen in real life?  (Besides, it seems to me that 
> if your system is getting a bunch of ECC memory errors then you've got 
> much worse problems than a simple USB failure!)
>

This problem is on ECC memory platform.
In the test scenario, the problem is 100% reproducible.

> And why do you worry about ECC memory failures in particular?  Can't 
> _any_ kind of failure cause the usb_control_msg() operation to fail?
> 
>> At this point, the returned buffer data is an abnormal value, and
>> continuing to use it will lead to incorrect results.
> 
> The driver already contains code to check for abnormal values.  The 
> check is not perfect, but it should prevent things from going too 
> badly wrong.
>

If it is ECC memory error. These parameter checks would also
actually be invalid.

>> Therefore, it is necessary to judge the return value and exit.
>>
>> Signed-off-by: liulongfang <liulongfang@xxxxxxxxxx>
> 
> There is a flaw in your reasoning.
> 
> The operation carried out here is deliberately unsafe (for full-speed 
> devices).  It is made before we know the actual maxpacket size for ep0, 
> and as a result it might return an error code even when it works okay.  
> This shouldn't happen, but a lot of USB hardware is unreliable.
> 
> Therefore we must not ignore the result merely because r < 0.  If we do 
> that, the kernel might stop working with some devices.
> 
It may be that the handling solution for ECC errors is different from that
of the OS platform. On the test platform, after usb_control_msg() fails,
reading the memory data of buf will directly lead to kernel crash:

[ T14] Call trace:
[ T14] hub_port_init+0x280/0x9f0
[ T14] hub_port_connect+0x1d4/0xa40
[ T14] hub_port_connect_change+0xb8/0x2b0
[ T14] port_event+0x430/0x5d0
[ T14] hub_event+0x138/0x4a0
[ T14] process_one_work+0x1c8/0x39c
[ T14] worker_thread+0x150/0x3d0
[ T14] kthread+0xfc/0x130
[ T14] ret_from_fork+0x10/0x18
[ T14] Code: 528000c2 b9007fea 94002c9a b9407fea (39401f41)

thanks,
Longfang.
> Alan Stern
> 
>> ---
>>  drivers/usb/core/hub.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
>> index a739403a9e45..6a43198be263 100644
>> --- a/drivers/usb/core/hub.c
>> +++ b/drivers/usb/core/hub.c
>> @@ -4891,6 +4891,16 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1,
>>  					USB_DT_DEVICE << 8, 0,
>>  					buf, GET_DESCRIPTOR_BUFSIZE,
>>  					initial_descriptor_timeout);
>> +				/* On systems that use ECC memory, ECC errors can
>> +				 * cause the USB controller to halt.
>> +				 * It causes this operation to fail. At this time,
>> +				 * the buf data is an abnormal value and needs to be exited.
>> +				 */
>> +				if (r < 0) {
>> +					kfree(buf);
>> +					goto fail;
>> +				}
>> +
>>  				switch (buf->bMaxPacketSize0) {
>>  				case 8: case 16: case 32: case 64: case 255:
>>  					if (buf->bDescriptorType ==
>> -- 
>> 2.24.0
>>
> 
> .
> 



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux