Re: [linus:master] [mm] efa7df3e3b: kernel_BUG_at_include/linux/page_ref.h

David Hildenbrand <david@xxxxxxxxxx> · Fri, 31 May 2024 20:24:29 +0200

On 31.05.24 20:13, Yang Shi wrote:
On Fri, May 31, 2024 at 11:07 AM Yang Shi <shy828301@xxxxxxxxx> wrote:

On Fri, May 31, 2024 at 10:46 AM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 31.05.24 18:50, Yang Shi wrote:
On Fri, May 31, 2024 at 1:24 AM kernel test robot <oliver.sang@xxxxxxxxx> wrote:

Hello,

kernel test robot noticed "kernel_BUG_at_include/linux/page_ref.h" on:

commit: efa7df3e3bb5da8e6abbe37727417f32a37fba47 ("mm: align larger anonymous mappings on THP boundaries")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master      e0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc]
[test failed on linux-next/master 6dc544b66971c7f9909ff038b62149105272d26a]

in testcase: trinity
version: trinity-x86_64-6a17c218-1_20240527
with following parameters:

          runtime: 300s
          group: group-00
          nr_groups: 5

compiler: gcc-13
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)

we noticed the issue does not always happen. 34 times out of 50 runs as below.
the parent is clean.

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274
---------------- ---------------------------
         fail:runs  %reproduction    fail:runs
             |             |             |
             :50          68%          34:50    dmesg.Kernel_panic-not_syncing:Fatal_exception
             :50          68%          34:50    dmesg.RIP:try_get_folio
             :50          68%          34:50    dmesg.invalid_opcode:#[##]
             :50          68%          34:50    dmesg.kernel_BUG_at_include/linux/page_ref.h

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202405311534.86cd4043-lkp@xxxxxxxxx

[  275.267158][ T4335] ------------[ cut here ]------------
[  275.267949][ T4335] kernel BUG at include/linux/page_ref.h:275!
[  275.268526][ T4335] invalid opcode: 0000 [#1] KASAN PTI
[  275.269001][ T4335] CPU: 0 PID: 4335 Comm: trinity-c3 Not tainted 6.7.0-rc4-00061-gefa7df3e3bb5 #1
[  275.269787][ T4335] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 275.270679][ T4335] RIP: 0010:try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.271159][ T4335] Code: c3 cc cc cc cc 44 89 e6 48 89 df e8 e4 54 11 00 eb ae 90 0f 0b 90 31 db eb d5 9c 58 0f 1f 40 00 f6 c4 02 0f 84 46 ff ff ff 90 <0f> 0b 48 c7 c6 a0 54 d2 87 48 89 df e8 a9 e9 ff ff 90 0f 0b be 04

If I read this BUG correctly, it is:

VM_BUG_ON(!in_atomic() && !irqs_disabled());

Yes, that seems to be the one.

try_grab_folio() actually assumes it is in an atomic context (irq
disabled or preempt disabled) for this call path. This is achieved by
disabling irq in gup fast or calling it in rcu critical section in
page cache lookup path

try_grab_folio()->try_get_folio()->folio_ref_try_add_rcu()

Is called (mm-unstable) from:

(1) gup_fast function, here IRQs are disable
(2) gup_hugepte(), possibly problematic
(3) memfd_pin_folios(), possibly problematic
(4) __get_user_pages(), likely problematic

(1) should be fine.

(2) is possibly problematic on the !fast path. If so, due to commit
      a12083d721d7 ("mm/gup: handle hugepd for follow_page()") ? CCing Peter.

(3) is possibly wrong. CCing Vivek.

(4) is what we hit here

And try_grab_folio() is used when the folio is a large folio. The

We come via process_vm_rw()->pin_user_pages_remote()->__get_user_pages()->try_grab_folio()

That code was added in

commit 57edfcfd3419b4799353d8cbd6ce49da075cfdbd
Author: Peter Xu <peterx@xxxxxxxxxx>
Date:   Wed Jun 28 17:53:07 2023 -0400

      mm/gup: accelerate thp gup even for "pages != NULL"

      The acceleration of THP was done with ctx.page_mask, however it'll be
      ignored if **pages is non-NULL.

Likely the try_grab_folio() in __get_user_pages() is wrong?

As documented, we already hold a refcount. Likely we should better do a
folio_ref_add() and sanity check the refcount.

Yes, a plain folio_ref_add() seems ok for these cases.

In addition, the comment of folio_try_get_rcu() says, which is just a
wrapper of folio_ref_try_add_rcu():

You can also use this function if you're holding a lock that prevents
pages being frozen & removed; eg the i_pages lock for the page cache
or the mmap_lock or page table lock for page tables.  In this case, it
will always succeed, and you could have used a plain folio_get(), but
it's sometimes more convenient to have a common function called from
both locked and RCU-protected contexts.

So IIUC we can use the plain folio_get() at least for
process_vm_readv/writev since mmap_lock is held in this path.

In essence, I think: try_grab_folio() should only be called from GUP-fast where
IRQs are disabled.

Yes, I agree. Just the fast path should need to call try_grab_folio().

try_grab_folio() also handles FOLL_PIN and FOLL_GET, so we may just
keep calling it and add a flag to try_grab_folio, just like:

if flag is true
     folio_ref_add()
else
     try_get_folio()

try_grab_page() is what we use on the GUP-slow path. We'd likely want a 
folio variant of that.

We might want to call that gup_try_grab_folio() and rename the other one 
to gup_fast_try_grab_folio().

Or something like that :)

--
Cheers,

David / dhildenb