Re: [PATCH] USB:bugfix a controller halt error

liulongfang <liulongfang@xxxxxxxxxx> · Thu, 10 Aug 2023 09:20:36 +0800



On 2023/7/27 23:57, Alan Stern wrote:
> On Thu, Jul 27, 2023 at 05:31:41PM +0200, Oliver Neukum wrote:
>> On 27.07.23 16:42, Alan Stern wrote:
>>> On Thu, Jul 27, 2023 at 03:03:57PM +0800, liulongfang wrote:
>>>> On 2023/7/26 22:20, Alan Stern wrote:
>>
>>>>> It seems to me that something along these lines must be necessary in
>>>>> any case.  Unless the bad memory is cleared somehow, it would never be
>>>>> usable again.  The kernel might deallocate it, then reallocate for
>>>>> another purpose, and then crash when the new user tries to access it.
>>>>>
>>>>> In fact, this scenario could still happen even with your patch, which
>>>>> means the patch doesn't really fix the problem.
>>
>> I suppose in theory you could have something like a bad blocks list
>> just for RAM, but that would really hurt. You'd have to do something
>> about every DMA operation in every driver in theory.
>>
>> Error handling would basically be an intentional memory leak.
> 
> I started out thinking this way, but maybe that's not how it works.  
> Perhaps simply overwriting the part of memory that got the ECC error 
> would clear the error state.  (This may depend on the kind of error, 
> one-time vs. permanent.)
> 
> If that's the case, and if the memory buffer was deallocated without 
> being accessed and then later reallocated, things would be okay.  The 
> routine that reallocated the buffer wouldn't try to read from it before 
> initializing it somehow.
> 
>>>> This patch is only used to prevent data in the buffer from being accessed.
>>>> As long as the data is not accessed, the kernel does not crash.
>>>
>>> I still don't understand.  You haven't provided nearly enough
>>> information.  You should start by answering the questions that Oliver
>>> asked.  Then answer this question:
>>>
>>> The code you are concerned about is this:
>>>
>>> 		r = usb_control_msg(udev, usb_rcvaddr0pipe(),
>>> 				USB_REQ_GET_DESCRIPTOR, USB_DIR_IN,
>>> 				USB_DT_DEVICE << 8, 0,
>>> 				buf, GET_DESCRIPTOR_BUFSIZE,
>>> 				initial_descriptor_timeout);
>>> 		switch (buf->bMaxPacketSize0) {
>>>
>>> You're worried that if an ECC memory error occurs during the
>>> usb_control_msg transfer, the kernel will crash when the "switch"
>>> statement tries to read the value of buf->bMaxPacketSize0.  That's a
>>> reasonable thing to worry about.
>>
>> Albeit unlikely. If the hardware and implementation are reasonable
>> you'd return a specific error code from the HCD and clean up the
>> RAM in your ecc driver.
>>
>> The fix for USB would then conceptually be something like
>>
>> retryio:
>> 	r = usb_control_msg()
>> 	if (r == -EMEMORYCORRUPTION)
>> 		goto retryio;
> 
> Yes, we could do this, but it's not necessary.  Let's say that the HCD 
> returns -EMEMORYCORRUPTION and the ecc driver cleans up the RAM 
> (probably by resetting its contents to 0, but possibly leaving garbage 
> there instead).  Then when the following code in hub_port_init() tests 
> buf->bMaxPacketSize0, it will see an invalid value and will retry the 
> transfer.
> 
> Or, with low probability, it will see a valid but incorrect value.  If 
> that happens then later transfers using ep0 will fail, causing the hub 
> driver to reiterate the outer loop in hub_port_connect().  Eventually 
> the device will be correctly initialized and enumerated.
> 
> Alan Stern
>

OK, thanks.
Longfang.
> .
>