Hi, (please avoid top-posting, see: http://daringfireball.net/2007/07/on_top) Alex Damian <alex.r.damian@xxxxxxxxx> writes: > Forgot to mention, I just reproduced it on the mainline 4.8.1 kernel. > > On Wed, Oct 12, 2016 at 5:13 PM, Alex Damian <alex.r.damian@xxxxxxxxx> wrote: >> Hello, >> >> To follow up on the original bug report. I am still experiencing >> memory corruption problems in the xhci stack. >> >> One thing I noticed is that the corruption always occur on a secondary >> CPU (ie. the stack trace starts on cpu_startup_entry) and it is always >> going on when trying to handle an intrerrupt. >> >> Seems to me that a mutex or something similar is not correctly locked, >> but I don't have any experience with the code around this part, so I >> have no idea where to look. >> >> Pointers, ideas, suggestions ? How about we start with Mathias' suggestion to enable xhci debugging messages? Quoting it again here: >>> Enabling xhci debug could reveal something. >>> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control (keeping context below) >> On Thu, Aug 25, 2016 at 2:22 PM, Mathias Nyman >> <mathias.nyman@xxxxxxxxxxxxxxx> wrote: >>> On 29.07.2016 17:41, Alex Damian wrote: >>>> >>>> On Fri, Jul 29, 2016 at 2:53 PM, Greg KH <greg@xxxxxxxxx> wrote: >>>>> >>>>> On Fri, Jul 29, 2016 at 10:58:03AM +0100, Alex Damian wrote: >>>>>> >>>>>> Hi Greg, >>>>>> >>>>>> I managed to reproduce with a untainted kernel, see dmesg paste below. >>>>>> The stack seemed corrupted as well ? >>>>>> >>>>>> I refered to it as a crash since after a couple of these issues, the >>>>>> machine hard freezes - I set up a serial console via a USB cable, but >>>>>> I don't get the kernel oops out of the machine. The network is also >>>>>> dead before getting any data. I could not think of any other way to >>>>>> get a console out of a Macbook - any ideas ? >>>>>> >>>>>> There is a progressive level of deterioration going on below, this is >>>>>> why I'm adding multiple pastes. See the obviously invalid pointer >>>>>> 0000000000000001 in 3rd paste below. Also, see the protection fault in >>>>>> the last paste. To me, something is trampling all over memory, and it >>>>>> is usb-related. >>>>> >>>>> >>>>> Not good, thanks for reproducing it without the closed kernel drivers. >>>>> >>>>> If you disable the list debug kernel option, do you have any problems >>>>> with the machine? We aren't having any other reports of issues like >>>>> this at the moment, which makes me worry that it's something unique to >>>>> your situation/hardware. >>>> >>>> >>>> I strongly suspect it's related to the macbook 12,1 hardware. I >>>> haven't been able >>>> to reproduce this with other machines, including other macbook >>>> versions with the same peripherals. >>>> >>>> This machine has never been stable in this particular peripheral >>>> configuration. >>>> I had Apple run all HW diagnostics on the machine, I ran the memcheck >>>> to verify that >>>> the RAM is ok - all results are clean. The machine is very stable under >>>> Mac OSX. >>>> >>>>> And you don't know that it's a USB problem, only that USB is the one >>>>> that is showing the issue. Anyone could be writing over memory. >>>> >>>> >>>> True. However it seems particularly related to the USB mouse - that's >>>> how I manage >>>> to reproduce the error. >>>> >>>>> >>>>> Also, any chance you can use 'git bisect' to track down an offending >>>>> commit? I'm assuming that this used to work properly and something >>>>> recently caused the issue, correct? >>>> >>>> >>>> The earliest kernels I've tested are in the 3.3 range. All kernels >>>> before 4.7 just lock up. >>>> 4.7 is the first kernel where I have meaningful dmesg errors before >>>> locking up. As such, >>>> there is very little that I can do to bisect :(. >>>> >>> >>> Going through xhci related issues that occurred during my vacation. >>> >>> There is one command list related issue fixed in 4.8-rc3, any chance you >>> could try it? >>> Alternatively just add the following patch added to 4.7: >>> 33be126 xhci: always handle "Command Ring Stopped" events >>> >>> Enabling xhci debug could reveal something. >>> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control >>> >>> -Mathias >>> > -- > To unsubscribe from this list: send the line "unsubscribe linux-usb" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- balbi
Attachment:
signature.asc
Description: PGP signature