On 01.03.24 22:47, Matthew Wilcox (Oracle) wrote:
Oscar and I have been exchanging a bit of email recently about the
bug reported here:
https://lore.kernel.org/all/ZXNhGsX32y19a2Xv@xxxxxxxxxxxxxxxxxxxx
I've come to the conclusion that folio_test_hugetlb() is just too fragile
as it can give both false positives and false negatives, as well as
resulting in the above bug. With this patch series, it becomes a lot
more robust. In the memory-failure case, we always hold the hugetlb_lock
so it's perfectly reliable. In the compaction caase, it's unreliable, but
the failures are acceptable and we recheck after taking the hugetlb_lock.
The cost of this reliability is that we now consume the word I recently
freed in folio->page[1]. I think this is acceptable; we've still gained
a completely reliable folio_test_hugetlb() (which we didn't have before
I started messing around with the folio dtors). Non-hugetlb users
can use large_id as a pointer to something else entirely, or even as a
non-pointer, as long as they can guarantee it can't conflict (ie don't
use it as a bitfield).
That probably means that we have to always set the lowest bit to use it
for something else, or use another bit.
I was wondering if
a) We could move that to another subpage. In hugetlb folios we have
plenty of space for such things. I guess we'd have be able to detect the
folio size without holding a reference, to make sure we can touch
another subpage.
b) We could overload _nr_pages_mapped. We'd effectively have to steal
one bit from _nr_pages_mapped to make this work.
Maybe what works is using the existing mechanism (hugetlb flag), and
then storing the pointer in __nr_pages_mapped.
So depending on the hugetlb flag, we can interpret __nr_pages_mapped
either as the pointer or as the old variant.
Mostly only folio_large_is_mapped() would need care for now, to ignore
_nr_pages_mapped if the hugetlb flag is set.
--
Cheers,
David / dhildenb