On Fri, Feb 10, 2017 at 10:39:32PM +0000, James Hogan wrote: > > > > > and essentially Justin's commit just made problem 1) to occur, but is > > > not the root cause of the crash you are seeing? > > > > That would not necessarily be my conclusion. Of course, the code appears > > to be heavily SMP related, so it may well be that it exposes some > > problem associated with cache handling or support in non-SMP configurations. > > > > Of course, it might also be possible that there is a qemu problem somewhere > > which only manifests itself on non-SMP mips images with Justin's commit > > applied. That appears to be somewhat unlikely, though I have no hard data > > supporting this guess. > > > > I'll do some more testing and try to find the actual crash location. > > Tricky though since it almost looks like there is a not completely > > initialized workqueue. Making things worse, the problem "goes away" > > if I add some debug log into process_one_work(), meaning there may > > be a heisenbug. > > cracked it by moving around an early return error. populate_cache() > macro has multiple statements with no do while (0) around it. The > c->scache.waysize condition in populate_cache_leaves then only > conditionalises the first statement in the macro and in absense of l2 > (or l3 for that matter) it'll continue to write beyond the end of the > array allocated in detect_cache_attributes(). Badness ensues. > Outch. Yes, after you mention it, the problem is easy to see. > The SMP calls in arch/mips/kernel/cacheinfo.c file are pretty redundant > too since all the cache info is read from the cpu info structures. > > I'll write a patch. Thanks for reporting Guenter! > Thank you for tracking it down! Guenter