* Guenter Roeck <linux@xxxxxxxxxxxx> [220519 17:42]: > On 5/19/22 07:35, Liam Howlett wrote: > > * Guenter Roeck <linux@xxxxxxxxxxxx> [220517 10:32]: > > > > ... > > > > > > Another bisect result, boot failures with nommu targets (arm:mps2-an385, > > > m68k:mcf5208evb). Bisect log is the same for both. > > ... > > > # first bad commit: [bd773a78705fb58eeadd80e5b31739df4c83c559] nommu: remove uses of VMA linked list > > > > I cannot reproduce this on my side, even with that specific commit. Can > > you point me to the failure log, config file, etc? Do you still see > > this with the fixes I've sent recently? > > > > This was in linux-next; most recently with next-20220517. > I don't know if that was up-to-date with your patches. > The problem seems to be memory allocation failures. > A sample log is at > https://kerneltests.org/builders/qemu-m68k-next/builds/1065/steps/qemubuildcommand/logs/stdio > The log history at > https://kerneltests.org/builders/qemu-m68k-next?numbuilds=30 > will give you a variety of logs. > > The configuration is derived from m5208evb_defconfig, with initrd > and command line embedded in the image. You can see the detailed > configuration updates at > https://github.com/groeck/linux-build-test/blob/master/rootfs/m68k/run-qemu-m68k.sh > > Qemu command line is > > qemu-system-m68k -M mcf5208evb -kernel vmlinux \ > -cpu m5208 -no-reboot -nographic -monitor none > -append "rdinit=/sbin/init console=ttyS0,115200" > > with initrd from > https://github.com/groeck/linux-build-test/blob/master/rootfs/m68k/rootfs-5208.cpio.gz > > I use qemu v6.2, but any recent qemu version should work. I have qemu 7.0 which seems to change the default memory size from 32MB to 128MB. This can be seen on your log here: Memory: 27928K/32768K available (2827K kernel code, 160K rwdata, 432K rodata, 1016K init, 66K bss, 4840K reserved, 0K cma-reserved) With 128MB the kernel boots. With 64MB it also boots. 32MB fails with an OOM. Looking into it more, I see that the OOM is caused by a contiguous page allocation of 1MB (order 7 at 8K pages). This can be seen in the log as well: Running sysctl: echo: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL), nodemask=(null) ... nommu: Allocation of length 884736 from process 63 (echo) failed This last log message above comes from the code path that uses alloc_pages_exact(). I don't see why my 256 byte nodes (order 0 allocations yield 32 nodes) would fragment the memory beyond use on boot. I have checked for some sort of massive leak by adding a static node count to the code and have only ever hit ~12 nodes. Consulting the OOM log from the above link again: DMA: 0*8kB 1*16kB (U) 9*32kB (U) 7*64kB (U) 21*128kB (U) 7*256kB (U) 6*512kB (U) 0*1024kB 0*2048kB 0*4096kB 0*8192kB = 8304kB So to get to the point of breaking up a 1MB block, we'd need an obscene number of nodes. Furthermore, the OOM on boot is not always happening. When boot succeeds without an oom, I checked slabinfo and see that the maple_node has 32 active objects which is 1 order 0 allocation. The boot does mostly cause an OOM. It is worth noting that the slabinfo count is lazy on counting the number of active objects so it is most likely lower than this value in reality. Does anyone have any idea why nommu would be getting this fragmented? Thanks, Liam