On 09.12.24 13:33, Mateusz Guzik wrote:
On Mon, Dec 9, 2024 at 11:56 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
On 09.12.24 11:25, Mateusz Guzik wrote:
On Mon, Dec 9, 2024 at 10:28 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
On 07.12.24 09:29, Mateusz Guzik wrote:
Explicitly pre-checking the count adds nothing as atomic_add_unless
starts with doing the same thing. iow no functional changes.
I recall that we added that check because with the hugetlb vmemmap
optimization, some of the tail pages we don't ever expect to be modified
(because they are fake-duplicated) might be mapped R/O.
If the arch implementation of atomic_add_unless() would trigger an
unconditional write fault, we'd be in trouble. That would likely only be
the case if the arch provides a dedicate instruction.
atomic_add_unless()->raw_atomic_add_unless()
Nobody currently defines arch_atomic_add_unless().
raw_atomic_fetch_add_unless()->arch_atomic_fetch_add_unless() is defined
on some architectures.
I scanned some of the inline-asm, and I think most of them perform a
check first.
Huh.
Some arch triggering a write fault despite not changing the value is
not something I thought about. Sounds pretty broken to me if any arch
was to do it, but then stranger things did happen.
Yeah, it really depends on what the architecture defines. For example,
on s390x for "COMPARE AND SWAP" the spec states something like
[snip]
Well in this context you need to do the initial load to even know what
to CAS with, unless you want to blindly do it hoping to get lucky,
which I'm assuming no arch is doing.
Granted, if there was an architecture which had an actual "cas unless
the value is x" then this would not hold, but I don't know of any.
[such an extension would be most welcome fwiw]
Apparently, we prepared for that via arch_atomic_add_unless(), which has no users.
Assuming you indeed want the patch after all, can you sort out adding
a comment to atomic_add_unless yourself? ;) I presume you know the
right people and whatnot, so this would cut down on back and forth.
That is to say I think this thread just about exhausted the time
warranted by this patch. No hard feelz if it gets dropped, but then I
do strongly suggest adding a justification to the extra load.
Maybe it's sufficient for now to simply do your change with a comment:
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 8c236c651d1d6..1efc992ad5687 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -234,7 +234,13 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u)
rcu_read_lock();
/* avoid writing to the vmemmap area being remapped */
- if (!page_is_fake_head(page) && page_ref_count(page) != u)
+ if (!page_is_fake_head(page))
+ /*
+ * atomic_add_unless() will currently never modify the value
+ * if it already is u. If that ever changes, we'd have to have
+ * a separate check here, such that we won't be writing to
+ * write-protected vmemmap areas.
+ */
ret = atomic_add_unless(&page->_refcount, nr, u);
rcu_read_unlock();
It would bail out during testing ... hopefully, such that we can detect any such change.
--
Cheers,
David / dhildenb