Re: Crash in -next due to 'MIPS: Add cacheinfo support'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 10, 2017 at 02:11:52PM -0800, Guenter Roeck wrote:
> On Fri, Feb 10, 2017 at 11:15:31AM -0800, Florian Fainelli wrote:
> > On 02/10/2017 09:46 AM, Guenter Roeck wrote:
> > > On Fri, Feb 10, 2017 at 10:39:52AM +0000, James Hogan wrote:
> > >> On Fri, Feb 10, 2017 at 09:40:11AM +0000, James Hogan wrote:
> > >>> Hi Guenter,
> > >>>
> > >>> On Thu, Feb 09, 2017 at 08:50:04PM -0800, Guenter Roeck wrote:
> > >>>> On 02/09/2017 04:01 PM, Justin Chen wrote:
> > >>>>> Hello Guenter,
> > >>>>>
> > >>>>> I am unable to reproduce the problem. May you give me more details?
> > >>>>>
> > >>>> The scripts I am using are available at
> > >>>>
> > >>>> https://github.com/groeck/linux-build-test/tree/master/rootfs
> > >>>>
> > >>>> in the mips and mipsel subdirectories (both crash). Individual
> > >>>> build logs are always available at kerneltests.org/builders,
> > >>>> in the 'next' column.
> > >>>
> > >>> Did you find it 100% reproducible? I was trying to reproduce yesterday
> > >>> evening, and did hit it a few times, but then it stopped happening
> > >>> before I could try and figure out what was going wrong.
> > >>
> > >> I switched to gcc 5.2 (buildroot toolchain) for building kernel, and now
> > >> it reproduces every time :)
> > >>
> > > gcc 5.4.0 (binutils 2.26.1) also reliably crashes. The exact crash depends on
> > > the kernel (-next is different to ToT + offending patch, qemu command line
> > > makes a difference, qemu version makes a difference), but the crash is easy
> > > to reproduce, at least for me.
> > 
> > Just to clarify here, is the crash a combination of:
> > 
> > - the kernel image, based on different trees/branches
> 
> I tried with linux-next, and I tried with ToT with the offending patch
> applied. Both fail reliably. Without the patch, both pass reliably.
> 
> > - different GCC versions
> 
> I can only say that I see it crashes with both gcc 5.2.0 and gcc 5.4.0.
> 
> > - different ways of invoking QEMU/QEMU version?
> > 
> Yes and no. One way of _not_ (or maybe not always) seeing this crash
> is to use the bare-bone command line with qemu 2.7 or 2.8 (with no
> root file system provided). This crashes because there is no root file
> system, but not as described earlier. It does crash, though, when
> providing a more comprehensive command line. All kernel versions
> without this patch do not crash.
> 
> On the other side, blaming this on the qemu command line is akin to
> saying that a crash only seen if a mouse is connected would be caused
> by the mouse, so I am not entirely sure if that helps too much.
> Also see below, regarding heisenbug.
> 
> > and essentially Justin's commit just made problem 1) to occur, but is
> > not the root cause of the crash you are seeing?
> 
> That would not necessarily be my conclusion. Of course, the code appears
> to be heavily SMP related, so it may well be that it exposes some
> problem associated with cache handling or support in non-SMP configurations.
> 
> Of course, it might also be possible that there is a qemu problem somewhere
> which only manifests itself on non-SMP mips images with Justin's commit
> applied. That appears to be somewhat unlikely, though I have no hard data
> supporting this guess.
> 
> I'll do some more testing and try to find the actual crash location.
> Tricky though since it almost looks like there is a not completely
> initialized workqueue. Making things worse, the problem "goes away"
> if I add some debug log into process_one_work(), meaning there may
> be a heisenbug.

cracked it by moving around an early return error. populate_cache()
macro has multiple statements with no do while (0) around it. The
c->scache.waysize condition in populate_cache_leaves then only
conditionalises the first statement in the macro and in absense of l2
(or l3 for that matter) it'll continue to write beyond the end of the
array allocated in detect_cache_attributes(). Badness ensues.

The SMP calls in arch/mips/kernel/cacheinfo.c file are pretty redundant
too since all the cache info is read from the cpu info structures.

I'll write a patch. Thanks for reporting Guenter!

Cheers
James

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux