Re: [PATCH 0/5] Remove some races around folio_test_hugetlb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01.03.24 22:47, Matthew Wilcox (Oracle) wrote:
Oscar and I have been exchanging a bit of email recently about the
bug reported here:
https://lore.kernel.org/all/ZXNhGsX32y19a2Xv@xxxxxxxxxxxxxxxxxxxx

I've come to the conclusion that folio_test_hugetlb() is just too fragile
as it can give both false positives and false negatives, as well as
resulting in the above bug.  With this patch series, it becomes a lot
more robust.  In the memory-failure case, we always hold the hugetlb_lock
so it's perfectly reliable.  In the compaction caase, it's unreliable, but
the failures are acceptable and we recheck after taking the hugetlb_lock.

The cost of this reliability is that we now consume the word I recently
freed in folio->page[1].  I think this is acceptable; we've still gained
a completely reliable folio_test_hugetlb() (which we didn't have before
I started messing around with the folio dtors).  Non-hugetlb users
can use large_id as a pointer to something else entirely, or even as a
non-pointer, as long as they can guarantee it can't conflict (ie don't
use it as a bitfield).

That probably means that we have to always set the lowest bit to use it for something else, or use another bit.

I was wondering if

a) We could move that to another subpage. In hugetlb folios we have plenty of space for such things. I guess we'd have be able to detect the folio size without holding a reference, to make sure we can touch another subpage.

b) We could overload _nr_pages_mapped. We'd effectively have to steal one bit from _nr_pages_mapped to make this work.


Maybe what works is using the existing mechanism (hugetlb flag), and then storing the pointer in __nr_pages_mapped.

So depending on the hugetlb flag, we can interpret __nr_pages_mapped either as the pointer or as the old variant.

Mostly only folio_large_is_mapped() would need care for now, to ignore _nr_pages_mapped if the hugetlb flag is set.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux