Re: [PATCH 0/5] Remove some races around folio_test_hugetlb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07.03.24 22:14, Matthew Wilcox wrote:
On Thu, Mar 07, 2024 at 10:20:15AM +0100, David Hildenbrand wrote:
IOW:

word	page0			page1
0	flags			flags
1	lru.next		head
2	lru.prev		entire_mapcount + gap
3	mapping			nr_pages_mapped + gap / hugetlb_id
4	index			pincount + nr_pages
5	private			unused
6	mapcount+refcount	mapcount+refcount(0)
7	memcg_data		-

or on 32-bit

word	page0			page1
0	flags			flags
1	lru.next		head
2	lru.prev		entire_mapcount
3	mapping			nr_pages_mapped / hugetlb_id
4	index			pincount
5	private			unused
6	mapcount		mapcount
7	refcount		refcount
8	memcg_data		-
9+	virtual? last_cpupid? whatever

How about this layout?

@@ -350,8 +350,13 @@ struct folio {
                          unsigned long _head_1;
                          unsigned long _folio_avail;
          /* public: */
-                       atomic_t _entire_mapcount;
-                       atomic_t _nr_pages_mapped;
+                       union {
+                               unsigned long _hugetlb_id;
+                               struct {
+                                       atomic_t _entire_mapcount;
+                                       atomic_t _nr_pages_mapped;
+                               };
+                       };
                          atomic_t _pincount;
   #ifdef CONFIG_64BIT
                          unsigned int _folio_nr_pages;

That keeps _folio_avail as, well, available.  It puts _hugetlb_id in
the same bits as ->mapping.  It continues to leave ->private unused
on 64-bit.  I think this does everything we want?

_entire_mapcount is (still) used for hugetlb folios.

Oh, duh, of course it is.  I thought we used page[0].mapcount for them,
but we don't and shouldn't.  I suppose we could use a magic value for
page[0].mapcount to indicate hugetlb, but that'd make page_mapcount()
more complex.

With the total mapcount in place, I was thinking about renaming it to
"_pmd_mapcount" and stop using it for hugetlb folios, just like we'd not be
using _nr_pages_mapped for hugetlb folios.

[I also thought about moving the _pmd_mapcount to another subpage, where
we'd also have a _pud_mapcount in the future; but again, stuff for the
future]

Until then, wouldn't _hugetlb_id be problematic here? [I could move
_entire_mapcount/_pmd_mapcount later I guess]

New idea then, how about simply:

                         unsigned long _flags_1;
                         unsigned long _head_1;
-                       unsigned long _folio_avail;
+                       unsigned long _hugetlb_id;

We have to check the various other users of struct page to see what we
might conflict with.

Well, compared to the current version there is no change for me: the space I wanted to use for the total_mapcount is gone and I'll have to start being creative :)

Free space in subpage1: "The LORD gave and the LORD has taken away"

We seem to be running again into space issues in subpage1 (also, more relevant now that we will support order-1 folios ...). This kind of wastage+over-complicated layout (again :( ) just to handle hugetlb oddities for free pages -- refcount 0 -- feels very wrong.


Stupid idea: Do we really have to identify (possibly free) hugetlb folios that way from lockless pfnwalkers without a folio reference?

I mean, with a folio reference held it's all working as expected.

Couldn't we just make hugetlb store them in some kind of xarray that we can walk (using RCU?), indexed by start PFN we want to test?

So if we find the current hugetlb flag to be set on the lockless path, simply check that xarray. It's all super racy either way, we can get a free+split+reuse anytime.

Or is that completely flawed?

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux