Re: [linus:master] [mm] efa7df3e3b: kernel_BUG_at_include/linux/page_ref.h

David Hildenbrand <david@xxxxxxxxxx> · Fri, 31 May 2024 20:38:09 +0200

On 31.05.24 20:30, Yang Shi wrote:
On Fri, May 31, 2024 at 11:24 AM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 31.05.24 20:13, Yang Shi wrote:
On Fri, May 31, 2024 at 11:07 AM Yang Shi <shy828301@xxxxxxxxx> wrote:

On Fri, May 31, 2024 at 10:46 AM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 31.05.24 18:50, Yang Shi wrote:
On Fri, May 31, 2024 at 1:24 AM kernel test robot <oliver.sang@xxxxxxxxx> wrote:



Hello,

kernel test robot noticed "kernel_BUG_at_include/linux/page_ref.h" on:

commit: efa7df3e3bb5da8e6abbe37727417f32a37fba47 ("mm: align larger anonymous mappings on THP boundaries")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master      e0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc]
[test failed on linux-next/master 6dc544b66971c7f9909ff038b62149105272d26a]

in testcase: trinity
version: trinity-x86_64-6a17c218-1_20240527
with following parameters:

           runtime: 300s
           group: group-00
           nr_groups: 5



compiler: gcc-13
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


we noticed the issue does not always happen. 34 times out of 50 runs as below.
the parent is clean.

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274
---------------- ---------------------------
          fail:runs  %reproduction    fail:runs
              |             |             |
              :50          68%          34:50    dmesg.Kernel_panic-not_syncing:Fatal_exception
              :50          68%          34:50    dmesg.RIP:try_get_folio
              :50          68%          34:50    dmesg.invalid_opcode:#[##]
              :50          68%          34:50    dmesg.kernel_BUG_at_include/linux/page_ref.h



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202405311534.86cd4043-lkp@xxxxxxxxx


[  275.267158][ T4335] ------------[ cut here ]------------
[  275.267949][ T4335] kernel BUG at include/linux/page_ref.h:275!
[  275.268526][ T4335] invalid opcode: 0000 [#1] KASAN PTI
[  275.269001][ T4335] CPU: 0 PID: 4335 Comm: trinity-c3 Not tainted 6.7.0-rc4-00061-gefa7df3e3bb5 #1
[  275.269787][ T4335] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 275.270679][ T4335] RIP: 0010:try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.271159][ T4335] Code: c3 cc cc cc cc 44 89 e6 48 89 df e8 e4 54 11 00 eb ae 90 0f 0b 90 31 db eb d5 9c 58 0f 1f 40 00 f6 c4 02 0f 84 46 ff ff ff 90 <0f> 0b 48 c7 c6 a0 54 d2 87 48 89 df e8 a9 e9 ff ff 90 0f 0b be 04

If I read this BUG correctly, it is:

VM_BUG_ON(!in_atomic() && !irqs_disabled());


Yes, that seems to be the one.

try_grab_folio() actually assumes it is in an atomic context (irq
disabled or preempt disabled) for this call path. This is achieved by
disabling irq in gup fast or calling it in rcu critical section in
page cache lookup path

try_grab_folio()->try_get_folio()->folio_ref_try_add_rcu()

Is called (mm-unstable) from:

(1) gup_fast function, here IRQs are disable
(2) gup_hugepte(), possibly problematic
(3) memfd_pin_folios(), possibly problematic
(4) __get_user_pages(), likely problematic

(1) should be fine.

(2) is possibly problematic on the !fast path. If so, due to commit
       a12083d721d7 ("mm/gup: handle hugepd for follow_page()") ? CCing Peter.

(3) is possibly wrong. CCing Vivek.

(4) is what we hit here


And try_grab_folio() is used when the folio is a large folio. The


We come via process_vm_rw()->pin_user_pages_remote()->__get_user_pages()->try_grab_folio()

That code was added in

commit 57edfcfd3419b4799353d8cbd6ce49da075cfdbd
Author: Peter Xu <peterx@xxxxxxxxxx>
Date:   Wed Jun 28 17:53:07 2023 -0400

       mm/gup: accelerate thp gup even for "pages != NULL"

       The acceleration of THP was done with ctx.page_mask, however it'll be
       ignored if **pages is non-NULL.


Likely the try_grab_folio() in __get_user_pages() is wrong?

As documented, we already hold a refcount. Likely we should better do a
folio_ref_add() and sanity check the refcount.

Yes, a plain folio_ref_add() seems ok for these cases.

In addition, the comment of folio_try_get_rcu() says, which is just a
wrapper of folio_ref_try_add_rcu():

You can also use this function if you're holding a lock that prevents
pages being frozen & removed; eg the i_pages lock for the page cache
or the mmap_lock or page table lock for page tables.  In this case, it
will always succeed, and you could have used a plain folio_get(), but
it's sometimes more convenient to have a common function called from
both locked and RCU-protected contexts.

So IIUC we can use the plain folio_get() at least for
process_vm_readv/writev since mmap_lock is held in this path.



In essence, I think: try_grab_folio() should only be called from GUP-fast where
IRQs are disabled.

Yes, I agree. Just the fast path should need to call try_grab_folio().

try_grab_folio() also handles FOLL_PIN and FOLL_GET, so we may just
keep calling it and add a flag to try_grab_folio, just like:

if flag is true
      folio_ref_add()
else
      try_get_folio()


try_grab_page() is what we use on the GUP-slow path. We'd likely want a
folio variant of that.

We might want to call that gup_try_grab_folio() and rename the other one
to gup_fast_try_grab_folio().

Won't we duplicate the most code with two versions try_grab_folio()?

I meant something like:

try_grab_folio(struct page *page, int refs, unsigned int flags, bool fast)
{
     if fast
         try_get_folio()
     else
         folio_ref_add()
}


That's insufficient to handle FOLL_PIN. Likely we should do this:

diff --git a/mm/gup.c b/mm/gup.c
index 231711efa390d..fea93a64bf235 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -203,8 +203,8 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
 }
 
 /**
- * try_grab_page() - elevate a page's refcount by a flag-dependent amount
- * @page:    pointer to page to be grabbed
+ * try_grab_folio() - elevate a folios's refcount by a flag-dependent amount
+ * @folio:   pointer to folio to be grabbed
  * @flags:   gup flags: these are the FOLL_* flag values.
  *
  * This might not do anything at all, depending on the flags argument.
@@ -216,16 +216,16 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
  * time. Cases: please see the try_grab_folio() documentation, with
  * "refs=1".
  *
+ * Must not be called from GUP-fast: the folio must not get freed concurrently.
+ *
  * Return: 0 for success, or if no action was required (if neither FOLL_PIN
  * nor FOLL_GET was set, nothing is done). A negative error code for failure:
  *
  *   -ENOMEM           FOLL_GET or FOLL_PIN was set, but the page could not
  *                     be grabbed.
  */
-int __must_check try_grab_page(struct page *page, unsigned int flags)
+int __must_check try_grab_page(struct folio *folio, unsigned int flags)
 {
-       struct folio *folio = page_folio(page);
-
        if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
                return -ENOMEM;
 
@@ -239,7 +239,7 @@ int __must_check try_grab_page(struct page *page, unsigned int flags)
                 * Don't take a pin on the zero page - it's not going anywhere
                 * and it is used in a *lot* of places.
                 */
-               if (is_zero_page(page))
+               if (is_zero_folio(folio))
                        return 0;
 
                /*
@@ -260,6 +260,11 @@ int __must_check try_grab_page(struct page *page, unsigned int flags)
        return 0;
 }
 
+int __must_check try_grab_page(struct page *page, unsigned int flags)
+{
+       return gup_try_grab_folio(page_folio(page), flags);
+}
+
 /**
  * unpin_user_page() - release a dma-pinned page
  * @page:            pointer to page to be released


Then, fix the callers and rename the other one to gup_fast_*.


--
Cheers,

David / dhildenb