Re: Crash in -next due to 'MIPS: Add cacheinfo support'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/10/2017 02:39 PM, James Hogan wrote:
> On Fri, Feb 10, 2017 at 02:11:52PM -0800, Guenter Roeck wrote:
>> On Fri, Feb 10, 2017 at 11:15:31AM -0800, Florian Fainelli wrote:
>>> On 02/10/2017 09:46 AM, Guenter Roeck wrote:
>>>> On Fri, Feb 10, 2017 at 10:39:52AM +0000, James Hogan wrote:
>>>>> On Fri, Feb 10, 2017 at 09:40:11AM +0000, James Hogan wrote:
>>>>>> Hi Guenter,
>>>>>>
>>>>>> On Thu, Feb 09, 2017 at 08:50:04PM -0800, Guenter Roeck wrote:
>>>>>>> On 02/09/2017 04:01 PM, Justin Chen wrote:
>>>>>>>> Hello Guenter,
>>>>>>>>
>>>>>>>> I am unable to reproduce the problem. May you give me more details?
>>>>>>>>
>>>>>>> The scripts I am using are available at
>>>>>>>
>>>>>>> https://github.com/groeck/linux-build-test/tree/master/rootfs
>>>>>>>
>>>>>>> in the mips and mipsel subdirectories (both crash). Individual
>>>>>>> build logs are always available at kerneltests.org/builders,
>>>>>>> in the 'next' column.
>>>>>>
>>>>>> Did you find it 100% reproducible? I was trying to reproduce yesterday
>>>>>> evening, and did hit it a few times, but then it stopped happening
>>>>>> before I could try and figure out what was going wrong.
>>>>>
>>>>> I switched to gcc 5.2 (buildroot toolchain) for building kernel, and now
>>>>> it reproduces every time :)
>>>>>
>>>> gcc 5.4.0 (binutils 2.26.1) also reliably crashes. The exact crash depends on
>>>> the kernel (-next is different to ToT + offending patch, qemu command line
>>>> makes a difference, qemu version makes a difference), but the crash is easy
>>>> to reproduce, at least for me.
>>>
>>> Just to clarify here, is the crash a combination of:
>>>
>>> - the kernel image, based on different trees/branches
>>
>> I tried with linux-next, and I tried with ToT with the offending patch
>> applied. Both fail reliably. Without the patch, both pass reliably.
>>
>>> - different GCC versions
>>
>> I can only say that I see it crashes with both gcc 5.2.0 and gcc 5.4.0.
>>
>>> - different ways of invoking QEMU/QEMU version?
>>>
>> Yes and no. One way of _not_ (or maybe not always) seeing this crash
>> is to use the bare-bone command line with qemu 2.7 or 2.8 (with no
>> root file system provided). This crashes because there is no root file
>> system, but not as described earlier. It does crash, though, when
>> providing a more comprehensive command line. All kernel versions
>> without this patch do not crash.
>>
>> On the other side, blaming this on the qemu command line is akin to
>> saying that a crash only seen if a mouse is connected would be caused
>> by the mouse, so I am not entirely sure if that helps too much.
>> Also see below, regarding heisenbug.
>>
>>> and essentially Justin's commit just made problem 1) to occur, but is
>>> not the root cause of the crash you are seeing?
>>
>> That would not necessarily be my conclusion. Of course, the code appears
>> to be heavily SMP related, so it may well be that it exposes some
>> problem associated with cache handling or support in non-SMP configurations.
>>
>> Of course, it might also be possible that there is a qemu problem somewhere
>> which only manifests itself on non-SMP mips images with Justin's commit
>> applied. That appears to be somewhat unlikely, though I have no hard data
>> supporting this guess.
>>
>> I'll do some more testing and try to find the actual crash location.
>> Tricky though since it almost looks like there is a not completely
>> initialized workqueue. Making things worse, the problem "goes away"
>> if I add some debug log into process_one_work(), meaning there may
>> be a heisenbug.
> 
> cracked it by moving around an early return error. populate_cache()
> macro has multiple statements with no do while (0) around it. The
> c->scache.waysize condition in populate_cache_leaves then only
> conditionalises the first statement in the macro and in absense of l2
> (or l3 for that matter) it'll continue to write beyond the end of the
> array allocated in detect_cache_attributes(). Badness ensues.

Good catch, thanks James!

> 
> The SMP calls in arch/mips/kernel/cacheinfo.c file are pretty redundant
> too since all the cache info is read from the cpu info structures.
> 
> I'll write a patch. Thanks for reporting Guenter!
> 
> Cheers
> James
> 


-- 
Florian




[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux