On Fri, 13 Nov 2020 at 17:25, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > On Fri, 13 Nov 2020 at 17:15, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > > > On Fri, 13 Nov 2020 at 16:58, Russell King - ARM Linux admin > > <linux@xxxxxxxxxxxxxxx> wrote: > > > > > > On Fri, Nov 13, 2020 at 03:43:27PM +0000, Guillaume Tucker wrote: > > > > On 13/11/2020 10:35, Ard Biesheuvel wrote: > > > > > On Fri, 13 Nov 2020 at 11:31, Guillaume Tucker > > > > > <guillaume.tucker@xxxxxxxxxxxxx> wrote: > > > > >> > > > > >> Hi Ard, > > > > >> > > > > >> Please see the bisection report below about a boot failure on > > > > >> RPi-2b. > > > > >> > > > > >> Reports aren't automatically sent to the public while we're > > > > >> trialing new bisection features on kernelci.org but this one > > > > >> looks valid. > > > > >> > > > > >> There's nothing in the serial console log, probably because it's > > > > >> crashing too early during boot. I'm not sure if other platforms > > > > >> on kernelci.org were hit by this in the same way, but there > > > > >> doesn't seem to be any. > > > > >> > > > > >> The same regression can be see on rmk's for-next branch as well > > > > >> as in linux-next. It happens with both bcm2835_defconfig and > > > > >> multi_v7_defconfig. > > > > >> > > > > >> Some more details can be found here: > > > > >> > > > > >> https://kernelci.org/test/case/id/5fae44823818ee918adb8864/ > > > > >> > > > > >> If this looks like a real issue but you don't have a platform at > > > > >> hand to reproduce it, please let us know if you would like the > > > > >> KernelCI test to be re-run with earlyprintk or some debug config > > > > >> turned on, or if you have a fix to try. > > > > >> > > > > >> Best wishes, > > > > >> Guillaume > > > > >> > > > > > > > > > > Hello Guillaume, > > > > > > > > > > That patch did have an issue, but it was already fixed by > > > > > > > > > > https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=9020/1 > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=fc2933c133744305236793025b00c2f7d258b687 > > > > > > > > > > Could you please double check whether cherry-picking that on top of > > > > > the first bad commit fixes the problem? > > > > > > > > Sadly this doesn't appear to be fixing the issue. I've > > > > cherry-picked your patch on top of the commit found by the > > > > bisection but it still didn't boot, here's the git log > > > > > > > > cbb9656e83ca ARM: 9020/1: mm: use correct section size macro to describe the FDT virtual address > > > > 7a1be318f579 ARM: 9012/1: move device tree mapping out of linear region > > > > e9a2f8b599d0 ARM: 9011/1: centralize phys-to-virt conversion of DT/ATAGS address > > > > 3650b228f83a Linux 5.10-rc1 > > > > > > > > Test log: https://people.collabora.com/~gtucker/lava/boot/rpi-2-b/v5.10-rc1-3-gcbb9656e83ca/ > > > > > > > > There's no output so it's hard to tell what is going on, but > > > > reverting the bad commmit does make the board to boot (that's > > > > what "revert: PASS" means in the bisect report). So it's > > > > unlikely that there is another issue causing the boot failure. > > > > > > These silent boot failures are precisely what the DEBUG_LL stuff (and > > > early_printk) is supposed to help with - getting the kernel messages > > > out when there is an oops before the serial console is initialised. > > > > > > > If this is indeed related to the FDT mapping, I would assume > > earlycon=... to be usable here. > > > > I will try to reproduce this on a RPi3 but I don't have a RPi2 at > > hand, unfortunately. > > > > Would you mind having a quick try whether you can reproduce this on > > QEMU, using the raspi2 machine model? If so, that would be a *lot* > > easier to diagnose. > > Also, please have a go with 'earlycon=pl011,0x3f201000' added to the > kernel command line. I cannot reproduce this - I don't have the exact same hardware, but for booting the kernel, I think RPi2 and RPi3 should be sufficiently similar, and I can boot on Rpi3 using a u-boot built for rpi2 using your provided dtb for RPi2. What puzzles me is that u-boot reports itself as U-Boot 2016.03-rc1-00131-g39af3d8-dirty RPI Model B+ (0x10) which is the ARMv6 model not the ARMv7, but then the kernel reports CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c53c7d So even though I am perfectly willing to accept that there is something wrong with the patch in question that needs to be fixed, trying to reproduce this using an ancient rc1 u-boot with local changes that identifies the platform incorrectly may be asking a bit much. Also, I did manage to get earlycon working with those zImages you provided, so please give that a go. And if you have any contacts that could lend me a RPi2, that would be very helpful (e.g., the BayLibre office is down the road from where I live)