Re: xhci_hcd crash on linux 4.7.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

(please avoid top-posting, see: http://daringfireball.net/2007/07/on_top)

Alex Damian <alex.r.damian@xxxxxxxxx> writes:
> Forgot to mention, I just reproduced it on the mainline 4.8.1 kernel.
>
> On Wed, Oct 12, 2016 at 5:13 PM, Alex Damian <alex.r.damian@xxxxxxxxx> wrote:
>> Hello,
>>
>> To follow up on the original bug report. I am still experiencing
>> memory corruption problems in the xhci stack.
>>
>> One thing I noticed is that the corruption always occur on a secondary
>> CPU (ie. the stack trace starts on cpu_startup_entry) and it is always
>> going on when trying to handle an intrerrupt.
>>
>> Seems to me that a mutex or something similar is not correctly locked,
>> but I don't have any experience with the code around this part, so I
>> have no idea where to look.
>>
>> Pointers, ideas, suggestions ?

How about we start with Mathias' suggestion to enable xhci debugging
messages?

Quoting it again here:

>>> Enabling xhci debug could reveal something.
>>> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control


(keeping context below)

>> On Thu, Aug 25, 2016 at 2:22 PM, Mathias Nyman
>> <mathias.nyman@xxxxxxxxxxxxxxx> wrote:
>>> On 29.07.2016 17:41, Alex Damian wrote:
>>>>
>>>> On Fri, Jul 29, 2016 at 2:53 PM, Greg KH <greg@xxxxxxxxx> wrote:
>>>>>
>>>>> On Fri, Jul 29, 2016 at 10:58:03AM +0100, Alex Damian wrote:
>>>>>>
>>>>>> Hi Greg,
>>>>>>
>>>>>> I managed to reproduce with a untainted kernel, see dmesg paste below.
>>>>>> The stack seemed corrupted as well ?
>>>>>>
>>>>>> I refered to it as a crash since after a couple of these issues, the
>>>>>> machine hard freezes - I set up a serial console via a USB cable, but
>>>>>> I don't get the kernel oops out of the machine. The network is also
>>>>>> dead before getting any data. I could not think of any other way to
>>>>>> get a console out of a Macbook - any ideas ?
>>>>>>
>>>>>> There is a progressive level of deterioration going on below, this is
>>>>>> why I'm adding multiple pastes. See the obviously invalid pointer
>>>>>> 0000000000000001 in 3rd paste below. Also, see the protection fault in
>>>>>> the last paste. To me, something is trampling all over memory, and it
>>>>>> is usb-related.
>>>>>
>>>>>
>>>>> Not good, thanks for reproducing it without the closed kernel drivers.
>>>>>
>>>>> If you disable the list debug kernel option, do you have any problems
>>>>> with the machine?  We aren't having any other reports of issues like
>>>>> this at the moment, which makes me worry that it's something unique to
>>>>> your situation/hardware.
>>>>
>>>>
>>>> I strongly suspect it's related to the macbook 12,1 hardware. I
>>>> haven't been able
>>>> to reproduce this with other machines, including other macbook
>>>> versions with the same peripherals.
>>>>
>>>> This machine has never been stable in this particular peripheral
>>>> configuration.
>>>> I had Apple run all HW diagnostics on the machine, I ran the memcheck
>>>> to verify that
>>>> the RAM is ok - all results are clean. The machine is very stable under
>>>> Mac OSX.
>>>>
>>>>> And you don't know that it's a USB problem, only that USB is the one
>>>>> that is showing the issue.  Anyone could be writing over memory.
>>>>
>>>>
>>>> True. However it seems particularly related to the USB mouse - that's
>>>> how I manage
>>>> to reproduce the error.
>>>>
>>>>>
>>>>> Also, any chance you can use 'git bisect' to track down an offending
>>>>> commit?  I'm assuming that this used to work properly and something
>>>>> recently caused the issue, correct?
>>>>
>>>>
>>>> The earliest kernels I've tested are in the 3.3 range. All kernels
>>>> before 4.7 just lock up.
>>>> 4.7 is the first kernel where I have meaningful dmesg errors before
>>>> locking up. As such,
>>>> there is very little that I can do to bisect :(.
>>>>
>>>
>>> Going through xhci related issues that occurred during my vacation.
>>>
>>> There is one command list related issue fixed in 4.8-rc3, any chance you
>>> could try it?
>>> Alternatively just add the following patch added to 4.7:
>>> 33be126 xhci: always handle "Command Ring Stopped" events
>>>
>>> Enabling xhci debug could reveal something.
>>> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
>>>
>>> -Mathias
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
balbi

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux