On 02/10/2017 02:39 PM, James Hogan wrote: > On Fri, Feb 10, 2017 at 02:11:52PM -0800, Guenter Roeck wrote: >> On Fri, Feb 10, 2017 at 11:15:31AM -0800, Florian Fainelli wrote: >>> On 02/10/2017 09:46 AM, Guenter Roeck wrote: >>>> On Fri, Feb 10, 2017 at 10:39:52AM +0000, James Hogan wrote: >>>>> On Fri, Feb 10, 2017 at 09:40:11AM +0000, James Hogan wrote: >>>>>> Hi Guenter, >>>>>> >>>>>> On Thu, Feb 09, 2017 at 08:50:04PM -0800, Guenter Roeck wrote: >>>>>>> On 02/09/2017 04:01 PM, Justin Chen wrote: >>>>>>>> Hello Guenter, >>>>>>>> >>>>>>>> I am unable to reproduce the problem. May you give me more details? >>>>>>>> >>>>>>> The scripts I am using are available at >>>>>>> >>>>>>> https://github.com/groeck/linux-build-test/tree/master/rootfs >>>>>>> >>>>>>> in the mips and mipsel subdirectories (both crash). Individual >>>>>>> build logs are always available at kerneltests.org/builders, >>>>>>> in the 'next' column. >>>>>> >>>>>> Did you find it 100% reproducible? I was trying to reproduce yesterday >>>>>> evening, and did hit it a few times, but then it stopped happening >>>>>> before I could try and figure out what was going wrong. >>>>> >>>>> I switched to gcc 5.2 (buildroot toolchain) for building kernel, and now >>>>> it reproduces every time :) >>>>> >>>> gcc 5.4.0 (binutils 2.26.1) also reliably crashes. The exact crash depends on >>>> the kernel (-next is different to ToT + offending patch, qemu command line >>>> makes a difference, qemu version makes a difference), but the crash is easy >>>> to reproduce, at least for me. >>> >>> Just to clarify here, is the crash a combination of: >>> >>> - the kernel image, based on different trees/branches >> >> I tried with linux-next, and I tried with ToT with the offending patch >> applied. Both fail reliably. Without the patch, both pass reliably. >> >>> - different GCC versions >> >> I can only say that I see it crashes with both gcc 5.2.0 and gcc 5.4.0. >> >>> - different ways of invoking QEMU/QEMU version? >>> >> Yes and no. One way of _not_ (or maybe not always) seeing this crash >> is to use the bare-bone command line with qemu 2.7 or 2.8 (with no >> root file system provided). This crashes because there is no root file >> system, but not as described earlier. It does crash, though, when >> providing a more comprehensive command line. All kernel versions >> without this patch do not crash. >> >> On the other side, blaming this on the qemu command line is akin to >> saying that a crash only seen if a mouse is connected would be caused >> by the mouse, so I am not entirely sure if that helps too much. >> Also see below, regarding heisenbug. >> >>> and essentially Justin's commit just made problem 1) to occur, but is >>> not the root cause of the crash you are seeing? >> >> That would not necessarily be my conclusion. Of course, the code appears >> to be heavily SMP related, so it may well be that it exposes some >> problem associated with cache handling or support in non-SMP configurations. >> >> Of course, it might also be possible that there is a qemu problem somewhere >> which only manifests itself on non-SMP mips images with Justin's commit >> applied. That appears to be somewhat unlikely, though I have no hard data >> supporting this guess. >> >> I'll do some more testing and try to find the actual crash location. >> Tricky though since it almost looks like there is a not completely >> initialized workqueue. Making things worse, the problem "goes away" >> if I add some debug log into process_one_work(), meaning there may >> be a heisenbug. > > cracked it by moving around an early return error. populate_cache() > macro has multiple statements with no do while (0) around it. The > c->scache.waysize condition in populate_cache_leaves then only > conditionalises the first statement in the macro and in absense of l2 > (or l3 for that matter) it'll continue to write beyond the end of the > array allocated in detect_cache_attributes(). Badness ensues. Good catch, thanks James! > > The SMP calls in arch/mips/kernel/cacheinfo.c file are pretty redundant > too since all the cache info is read from the cpu info structures. > > I'll write a patch. Thanks for reporting Guenter! > > Cheers > James > -- Florian