On 02.05.22 02:14, Liam Howlett wrote:
* Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> [220428 21:16]:On Fri, 29 Apr 2022 00:38:50 +0000 Liam Howlett <liam.howlett@xxxxxxxxxx> wrote:mm/mmap.c: In function 'do_brk_flags': mm/mmap.c:2908:17: error: implicit declaration of function 'khugepaged_enter_vma_merge'; did you mean 'khugepaged_enter_vma'? It appears that this is later fixed, but it hurts bisectability (and prevents me from finding the actual build failure in linux-next when trying to build corenet64_smp_defconfig).Yeah, that khugepaged_enter_vma_merge was renamed in another patch set. Andrew made the correction but kept the patch as it was. I think the suggested change is right.. if you read the commit that introduced khugepaged_enter_vma(), it seems right at least.Things are a bit crazy lately. Merge issues with mapletree, merge issues with mglru on mapletree, me doing a bunch of retooling to start publishing/merging via git, mapletree runtime issues, etc. I've dropped the mapletree patches again. Please scoop up all known fixes and redo against the (non-rebasing) mm-stable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mmOkay, sounds good. I have been porting my patches over and hit a bit of a snag. It looked like my patches were not booting on the s390 - but not all the time. So I reverted back to mm-stable (059342d1dd4e) and found that also failed to boot sometimes on my qemu setup. When it fails it's ~4-5sec into booting. The last thing I see is: "[ 4.668916] Spectre V2 mitigation: execute trampolines" I've bisected back to commit e553f62f10d9 (mm, page_alloc: fix build_zonerefs_node()) With the this commit, I am unable to boot one out of three times. When using the previous commit I was not able to get it to hang after trying 10+ times. This is a qemu s390 install with KASAN on and I see no error messages. I think it's likely it is this patch, but no guaranteed.
This sounds like a race condition during the setup of memory zones. I could imagine my patch is triggering this problem, but it should not be the real root cause. I'm no expert regarding zone setup, but I think it might help to print some zone data in case the problem is happening. Which data is needed I have no real idea, but maybe someone else can help here. The following diff should recognize the problematic case (it might show false positives, though): diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0e42038382c1..23f029f39985 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c@@ -6132,6 +6132,9 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs)
zone_type--; zone = pgdat->node_zones + zone_type; if (populated_zone(zone)) { + if (!managed_zone(zone)) { + /* Print some data regarding the zone. */ + } zoneref_set_zone(zone, &zonerefs[nr_zones++]); check_highest_zone(zone_type); } Juergen
Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature