Re: [linus:master] [mm] efa7df3e3b: kernel_BUG_at_include/linux/page_ref.h

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03.06.24 22:44, Yang Shi wrote:
On Mon, Jun 3, 2024 at 1:38 PM David Hildenbrand <david@xxxxxxxxxx> wrote:

try_get_folio() is all about grabbing a folio that might get freed
concurrently. That's why it calls folio_ref_try_add_rcu() and does
complicated stuff.

IMHO we can define it.. e.g. try_get_page() wasn't defined as so.

If we want to be crystal clear on that and if we think that's what we want,
again I would suggest we rename it differently from try_get_page() to avoid
future misuses, then add documents. We may want to also even assert the

Yes, absolutely.

rcu/irq implications in try_get_folio() at entrance, then that'll be
detected even without TINY_RCU config.


On !CONFIG_TINY_RCU, it performs a folio_ref_add_unless(). That's
essentially a atomic_add_unless(), which in the worst case ends up being a
cmpxchg loop.


Stating that we should be using try_get_folio() in paths where we are sure
the folio refcount is not 0 is the same as using folio_try_get() where
folio_get() would be sufficient.

The VM_BUG_ON in folio_ref_try_add_rcu() really tells us here that we are
using a function in the wrong context, although in our case, it is safe to
use (there is now BUG). Which is true, because we know we have a folio
reference and can simply use a simple folio_ref_add().

Again, just like we have folio_get() and folio_try_get(), we should
distinguish in GUP whether we are adding more reference to a folio (and
effectively do what folio_get() would), or whether we are actually grabbing
a folio that could be freed concurrently (what folio_try_get() would do).

Yes we can.  Again, IMHO it's a matter of whether it will worth it.

Note that even with SMP and even if we keep this code, the
atomic_add_unless only affects gup slow on THP only, and even with that
overhead it is much faster than before when that path was introduced.. and
per my previous experience we don't care too much there, really.

So it's literally only three paths that are relevant here on the "unless"
overhead:

    - gup slow on THP (I just added it; used to be even slower..)

    - vivik's new path

    - hugepd (which may be gone for good in a few months..)

IMHO none of them has perf concerns.  The real perf concern paths is
gup-fast when pgtable entry existed, but that must use atomic_add_unless()
anyway.  Even gup-slow !thp case won't regress as that uses try_get_page().

My point is primarily that we should be clear that the one thing is
GUP-fast, and the other is for GUP-slow.

Sooner or later we'll see more code that uses try_grab_page() to be
converted to folios, and people might naturally use try_grab_folio(),
just like we did with Vivik's code.

And I don't think we'll want to make GUP-slow in general using
try_grab_folio() in the future.

So ...


So again, IMHO the easist way to fix this WARN is we drop the TINY_RCU bit,
if nobody worries on UP perf.

I don't have a strong opinion, if any of us really worry about above three
use cases on "unless" overhead, and think it worthwhile to add the code to
support it, I won't object. But to me it's adding pain with no real benefit
we could ever measure, and adding complexity to backport too since we'll
need a fix for as old as 6.6.

... for the sake of fixing this WARN, I don't primarily care. Adjusting
the TINY_RCU feels wrong because I suspect somebody had good reasons to
do it like that, and it actually reported something valuable (using the
wrong function for the job).

I think this is the major concern about what fix we should do. If that
tiny rcu optimization still makes sense and is useful, we'd better
keep it. But I can't tell. Leaving it as is may be safer.

Willy moved that code in 020853b6f5e and I think it dates back to e286781d5f2e.

That contained:

+       /*
+        * Preempt must be disabled here - we rely on rcu_read_lock doing
+        * this for us.
+        *
+        * Pagecache won't be truncated from interrupt context, so if we have
+        * found a page in the radix tree here, we have pinned its refcount by
+        * disabling preempt, and hence no need for the "speculative get" that
+        * SMP requires.
+        */
+       VM_BUG_ON(page_count(page) == 0);
+       atomic_inc(&page->_count);
+


--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux