Re: [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21.08.24 10:44, David Hildenbrand wrote:
On 21.08.24 06:44, kernel test robot wrote:


Hello,

kernel test robot noticed "ltp.move_pages04.fail" on:

commit: b28dd7507f2dd7923325eab6ea1f291416dcc396 ("mm/migrate: convert add_page_for_migration() from follow_page() to folio_walk")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

[test failed on linux-next/master bb1b0acdcd66e0d8eedee3570d249e076b89ab32]

in testcase: ltp
version: ltp-x86_64-14c1f76-1_20240817
with following parameters:

	test: numa/move_pages04



compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) with 256G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202408211026.636ade1a-oliver.sang@xxxxxxxxx



Running tests.......
<<<test_start>>>
tag=move_pages04 stime=1724192393
cmdline="move_pages04"
contacts=""
analysis=exit
<<<test_output>>>
move_pages04    1  TFAIL  :  move_pages04.c:142: status[1] is ENOENT, expected EFAULT

This change is to be expected, and I touched on that in the patch
description. I am rather surprised that we have a test for that
handling, especially because it changed already (see below).

The man page says:

	-EFAULT: This is a zero page or the memory area is not mapped
		 by the process.

"memory area not mapped" to me translates to "there is no mmap()", not
"there is no page mapped". And it says:

	-ENOENT: The page is not present.


It's not really specifies what happens when "The memory area is mapped
by the process, but no page is faulted in.".

And the old handling was even inconsistent: to achieve the old behavior,
we abused FOLL_DUMP, which triggers in mm/gup.c:no_page_table():

(1) is_vm_hugetlb_page() *and* a hugetlb page is in the pagecache?
      Return -EFAULT. Otherwise return NULL -> -ENOENT.
(2) vma_is_anonymous() || !vma->vm_ops->fault ? Return -EFAULT.
      Otherwise return NULL -> -ENOENT.

So, if nothing is mapped, for things like shmem we would always return
"-ENOENT", for anonymous memory always "-EFAULT", and for hugetlb
"-ENOENT" or "-EFAULT", depending on the page cache state. Inconsistent,
and that handling is only in place because we abused FOLL_DUMP.

(there are other issues in the old implementation: on PMD migration
entries we would likely have returned -EFAULT in some cases where we
should have returned -ENOENT ...)

While writing folio_walk, I temporarily had a version that would return
error codes instead of NULL to indicate "there is something, but we
cannot return  it" and "there is nothing", but it didn't feel right. And
I'm not really interested in revisiting that :)

----

Staring at the test,  I realized the that behavior *changed* already,
because we wanted to fix the "zero page" and started abusing FOLL_DUMP,
but ended up changing the behavior for unpopulated (nothing mapped)
memory as well:

   * NAME
   *	move_pages04.c
   *
   * DESCRIPTION
   *      Failure when page does not exit.
   *
   * ALGORITHM
   *
   *      1. Pass zero page (allocated, but not written to) as one of the
   *         page addresses to move_pages().
   *      2. Check if the corresponding status is set to:
   *         -ENOENT for kernels < 4.3
   *         -EFAULT for kernels >= 4.3 [1]
   *
   * [1]
   * d899844e9c98 "mm: fix status code which move_pages() returns for
zero page"
   *

Likely test is *wrong*, because it claims to test the "zero page" but it
just passes "unpopulated" memory.

Let me dig deeper into the test.

Okay, and it even looks like the test caught the unintended change for "unpopulated memory", but instead we decided to change the test to expect the other return code ... because there was some confusion about "zero page".

Long story short: the test needs to be fixed.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux