* Mark Brown <broonie@xxxxxxxxxx> [230110 17:52]: > On Thu, Jan 05, 2023 at 07:15:44PM +0000, Liam Howlett wrote: > > > This patch set does two things: 1. Clean up, including removal of > > __vma_adjust() and 2. Extends the VMA iterator API to provide type > > safety to the VMA operations using the maple tree, as requested by Linus > > [1]. > > This series *appears* to be causing some fun issues in -next for the > past couple of days or so. The initial failures were seen by KernelCI > on several platforms (I've mostly been trying various arm64 things, at > least 32 bit ARM is also affected). The intial symptom seen is that a > go binary called skipgen that gets invoked as part of the testing > silently faults, tweaking things so that we get as far as running the > arm64 selftests results in much more useful output with various things > failing with actual error messages such as: > > ./fake_sigreturn_bad_magic: error while loading shared libraries: cannot make segment writable for relocation: Cannot allocate memory > ./sve-test: error while loading shared libraries: cannot make segment writable for relocation: Cannot allocate memory > > I'm fairly sure we're not actually running out of memory, there's no OOM > killer activity, the amount of memory the system has appears to make no > difference and just replacing the kernel with a mainline build runs as > expected. Thanks for the detailed analysis. This series has been dropped from mm-unstable and, I guess, out of linux-next by tomorrow. I will retest my series against a larger number of platforms before sending out the next revision. > > You can see the full run that produced the above errors at: > > https://lava.sirena.org.uk/scheduler/job/88257 > > which also embeds links to all the binaries used, exact commands run and > so on. The failing binaries all appear to be execed from within a > testsuite, though it's not *all* binaries execed from within tests (eg, > vec-syscfg execs things and seems happy). > > This has taken out a bunch of testsuites in KernelCI (and probably other > CI systems using test-definitions, though I didn't check). > > I tried to bisect this but otherwise haven't made any effort to look at > the failure. The bisect sadly got lost in this series since a lot of > the series either fails to build with: > > /home/broonie/git/bisect/mm/madvise.c: In function 'madvise_update_vma': > /home/broonie/git/bisect/mm/madvise.c:165:25: error: implicit declaration of function '__split_vma'; did you mean 'split_vma'? [-Werror=implicit-function-declaration] > 165 | error = __split_vma(mm, vma, start, 1); > | ^~~~~~~~~~~ > | split_vma Thanks. This was reported to me before and I had a fix in mm-unstable. I'll squash this into the series for v3. > > or fails to boot with something along the lines of: > > <6>[ 6.054380] Freeing initrd memory: 86880K > <1>[ 6.087945] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078 > <1>[ 6.088231] Mem abort info: > <1>[ 6.088340] ESR = 0x0000000096000004 > <1>[ 6.088504] EC = 0x25: DABT (current EL), IL = 32 bits > <1>[ 6.088671] SET = 0, FnV = 0 > <1>[ 6.088802] EA = 0, S1PTW = 0 > <1>[ 6.088929] FSC = 0x04: level 0 translation fault > <1>[ 6.089099] Data abort info: > <1>[ 6.089210] ISV = 0, ISS = 0x00000004 > <1>[ 6.089347] CM = 0, WnR = 0 > <1>[ 6.089486] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043e33000 > <1>[ 6.089692] [0000000000000078] pgd=0000000000000000, p4d=0000000000000000 > <0>[ 6.090566] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP > <4>[ 6.090866] Modules linked in: > <4>[ 6.091167] CPU: 0 PID: 42 Comm: modprobe Not tainted 6.2.0-rc1-00190-g505c59767243 #13 > <4>[ 6.091478] Hardware name: linux,dummy-virt (DT) > <4>[ 6.091784] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) > <4>[ 6.092048] pc : mas_wr_walk+0x60/0x2d0 > <4>[ 6.092622] lr : mas_wr_store_entry.isra.0+0x80/0x4a0 > <4>[ 6.092798] sp : ffff80000821bb10 > <4>[ 6.092926] x29: ffff80000821bb10 x28: ffff000003fa4480 x27: 0000000200100073 > <4>[ 6.093206] x26: ffff000003fa41b0 x25: ffff000003fa43f0 x24: 0000000000000002 > <4>[ 6.093445] x23: 0000000ffffae021 x22: 0000000000000000 x21: ffff000002a74440 > <4>[ 6.093685] x20: ffff000003fa4480 x19: ffff80000821bc48 x18: 0000000000000000 > <4>[ 6.093933] x17: 0000000000000000 x16: ffff000002b8da00 x15: ffff80000821bc48 > <4>[ 6.094169] x14: 0000ffffae022fff x13: ffffffffffffffff x12: ffff000002b8da0c > <4>[ 6.094427] x11: ffff80000821bb68 x10: ffffd75265462458 x9 : ffff80000821bc48 > <4>[ 6.094685] x8 : ffff80000821bbb8 x7 : ffff80000821bc48 x6 : ffffffffffffffff > <4>[ 6.094922] x5 : 000000000000000e x4 : 000000000000000e x3 : 0000000000000000 > <4>[ 6.095167] x2 : 0000000000000008 x1 : 000000000000000f x0 : ffff80000821bb68 > <4>[ 6.095499] Call trace: > <4>[ 6.095685] mas_wr_walk+0x60/0x2d0 > <4>[ 6.095936] mas_store_prealloc+0x50/0xa0 > <4>[ 6.096097] mmap_region+0x520/0x784 > <4>[ 6.096232] do_mmap+0x3b0/0x52c > <4>[ 6.096347] vm_mmap_pgoff+0xe4/0x10c > <4>[ 6.096480] ksys_mmap_pgoff+0x4c/0x204 > <4>[ 6.096621] __arm64_sys_mmap+0x30/0x44 > <4>[ 6.096754] invoke_syscall+0x48/0x114 > <4>[ 6.096900] el0_svc_common.constprop.0+0x44/0xec > <4>[ 6.097052] do_el0_svc+0x38/0xb0 > <4>[ 6.097183] el0_svc+0x2c/0x84 > <4>[ 6.097287] el0t_64_sync_handler+0xf4/0x120 > <4>[ 6.097457] el0t_64_sync+0x190/0x194 > <0>[ 6.097835] Code: 39402021 51000425 92401ca4 12001ca5 (f8647844) > <4>[ 6.098294] ---[ end trace 0000000000000000 ]--- > > (not always exactly the same backtrace, but the mas_wr_walk() was always > there.) Thanks. This was also reported and a fix had landed in mm-unstable as well. > > The specific set of commits in next-20230110 where bisect got lost was: > > 505c59767243 madvise: use vmi iterator for __split_vma() and vma_merge() > 1cfdd2a44d6b mmap: pass through vmi iterator to __split_vma() > 7d718fd9873c sched: convert to vma iterator > 2f94851ec717 mmap: use vmi version of vma_merge() > 7e2dd18353a3 task_mmu: convert to vma iterator > 756841b468f5 mm/mremap: use vmi version of vma_merge() > aaba4ba837fa mempolicy: convert to vma iterator > 8193673ee5d8 coredump: convert to vma iterator > d4f7ebf41a44 mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator > 4b02758dc3c5 mlock: convert mlock to vma iterator > fd367dac089e include/linux/mm: declare different type of split_vma() for !CONFIG_MMU > 3a72a0174748 mm/damon: stop using vma_mas_store() for maple tree store > dd51a3ca1096 mm: change mprotect_fixup to vma iterator > b9e4eabb8f40 mmap: convert __vma_adjust() to use vma iterator > c6fc05242a09 userfaultfd: use vma iterator > b9000fd4c5a6 mmap-convert-__vma_adjust-to-use-vma-iterator-fix > bdfb333b0b2a ipc/shm: use the vma iterator for munmap calls > 3128296746a1 mm: pass through vma iterator to __vma_adjust() > 80c8eed1721e mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma() > 311129a7971c mmap: convert vma_expand() to use vma iterator > 69e9b6c8a525 madvise: use split_vma() instead of __split_vma() > 751f0a6713a9 mm: remove unnecessary write to vma iterator in __vma_adjust() > a7f83eb601ef mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator > 39fd6622223e mm: pass vma iterator through to __vma_adjust() > ... I appreciate you running through the bisect and bringing this to my attention. I will do a better job of emailing linux-next the fixes, which I obviously overlooked.