On 1/22/24 2:59 PM, Ryan Roberts wrote: >>>> +CATEGORY="hugetlb" run_test ./hugetlb-read-hwpoison >>> >>> The addition of this test causes 2 later tests to fail with ENOMEM. I suspect >>> its a side-effect of marking the hugetlbs as hwpoisoned? (just a guess based on >>> the test name!). Once a page is marked poisoned, is there a way to un-poison it? >>> If not, I suspect that's why it wasn't part of the standard test script in the >>> first place. >> hugetlb-read-hwpoison failed as probably the fix in the kernel for the test >> hasn't been merged in the kernel. The other tests (uffd-stress) aren't >> failing on my end and on CI [1][2] > > To be clear, hugetlb-read-hwpoison isn't failing for me, its just causing the > subsequent tests uffd-stress tests to fail. Both of those subsequent tests are > allocating hugetlbs so my guess is that since this test is marking some hugetlbs > as poisoned, there are no longer enough for the subsequent tests. > >> >> [1] https://lava.collabora.dev/scheduler/job/12577207#L3677 >> [2] https://lava.collabora.dev/scheduler/job/12577229#L4027 >> >> Maybe its configurations issue which is exposed now. Not sure. Maybe >> hugetlb-read-hwpoison is changing some configuration and not restoring it. > > Well yes - its marking some hugetlb pages as HWPOISONED. > >> Maybe your system has less number of hugetlb pages. > > YEs probably; What is hugetlb-read-hwpoison's requirement for size and number of > hugetlb pages? the run_vmtests.sh script allocates the required number of > default-sized hugetlb pages before running any tests (I guess this value should > be increased for hugetlb-read-hwpoison's requirements?). > > Additionally, our CI preallocates non-default sizes from the kernel command line > at boot. Happy to increase these if you can tell me what the new requirement is: I'm not sure about the exact requirement of the number of hugetlb for these tests. But I specify hugepages=1000 and tests work for me. I've sent v2 [1]. Would it be possible to run your CI on that and share results before we merge that one? [1] https://lore.kernel.org/all/20240123073615.920324-1-usama.anjum@xxxxxxxxxxxxx > > hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2 > default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2 > > Thanks, > Ryan > -- BR, Muhammad Usama Anjum