On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker <guillaume.tucker@xxxxxxxxxxxxx> wrote: > > On 01/03/2019 20:41, Andrew Morton wrote: > > On Fri, 1 Mar 2019 09:25:24 +0100 Guillaume Tucker <guillaume.tucker@xxxxxxxxxxxxx> wrote: > > > >>>>> Michal had asked if the free space accounting fix up addressed this > >>>>> boot regression? I was awaiting word on that. > >>>> > >>>> hm, does bot@xxxxxxxxxxxx actually read emails? Let's try info@ as well.. > >> > >> bot@xxxxxxxxxxxx is not person, it's a send-only account for > >> automated reports. So no, it doesn't read emails. > >> > >> I guess the tricky point here is that the authors of the commits > >> found by bisections may not always have the hardware needed to > >> reproduce the problem. So it needs to be dealt with on a > >> case-by-case basis: sometimes they do have the hardware, > >> sometimes someone else on the list or on CC does, and sometimes > >> it's better for the people who have access to the test lab which > >> ran the KernelCI test to deal with it. > >> > >> This case seems to fall into the last category. As I have access > >> to the Collabora lab, I can do some quick checks to confirm > >> whether the proposed patch does fix the issue. I hadn't realised > >> that someone was waiting for this to happen, especially as the > >> BeagleBone Black is a very common platform. Sorry about that, > >> I'll take a look today. > >> > >> It may be a nice feature to be able to give access to the > >> KernelCI test infrastructure to anyone who wants to debug an > >> issue reported by KernelCI or verify a fix, so they won't need to > >> have the hardware locally. Something to think about for the > >> future. > > > > Thanks, that all sounds good. > > > >>>> Is it possible to determine whether this regression is still present in > >>>> current linux-next? > >> > >> I'll try to re-apply the patch that caused the issue, then see if > >> the suggested change fixes it. As far as the current linux-next > >> master branch is concerned, KernelCI boot tests are passing fine > >> on that platform. > > > > They would, because I dropped > > mm-shuffle-default-enable-all-shuffling.patch, so your tests presumably > > now have shuffling disabled. > > > > Is it possible to add the below to linux-next and try again? > > I've actually already done that, and essentially the issue can > still be reproduced by applying that patch. See this branch: > > https://gitlab.collabora.com/gtucker/linux/commits/next-20190301-beaglebone-black-debug > > next-20190301 boots fine but the head fails, using > multi_v7_defconfig + SMP=n in both cases and > SHUFFLE_PAGE_ALLOCATOR=y enabled in the 2nd case as a result > of the change in the default value. > > The change suggested by Michal Hocko on Feb 15th has now been > applied in linux-next, it's part of this commit but as > explained above it does not actually resolve the boot failure: > > 98cf198ee8ce mm: move buddy list manipulations into helpers > > I can send more details on Monday and do a bit of debugging to > help narrowing down the problem. Please let me know if > there's anything in particular that would seem be worth > trying. > Thanks for taking a look! Some questions when you get a chance: Is there an early-printk facility that can be turned on to see how far we get in the boot? Do any of the QEMU machine types [1] approximate this board? I.e. so I might be able to independently debug. Were there any boot *successes* on ARM with shuffling enabled? I.e. clues about what's different about the specific memory setup for beagle-bone-black. Thanks for the help! [1]: https://wiki.qemu.org/Documentation/Platforms/ARM