On 07.03.24 22:14, Matthew Wilcox wrote:
On Thu, Mar 07, 2024 at 10:20:15AM +0100, David Hildenbrand wrote:
IOW:
word page0 page1
0 flags flags
1 lru.next head
2 lru.prev entire_mapcount + gap
3 mapping nr_pages_mapped + gap / hugetlb_id
4 index pincount + nr_pages
5 private unused
6 mapcount+refcount mapcount+refcount(0)
7 memcg_data -
or on 32-bit
word page0 page1
0 flags flags
1 lru.next head
2 lru.prev entire_mapcount
3 mapping nr_pages_mapped / hugetlb_id
4 index pincount
5 private unused
6 mapcount mapcount
7 refcount refcount
8 memcg_data -
9+ virtual? last_cpupid? whatever
How about this layout?
@@ -350,8 +350,13 @@ struct folio {
unsigned long _head_1;
unsigned long _folio_avail;
/* public: */
- atomic_t _entire_mapcount;
- atomic_t _nr_pages_mapped;
+ union {
+ unsigned long _hugetlb_id;
+ struct {
+ atomic_t _entire_mapcount;
+ atomic_t _nr_pages_mapped;
+ };
+ };
atomic_t _pincount;
#ifdef CONFIG_64BIT
unsigned int _folio_nr_pages;
That keeps _folio_avail as, well, available. It puts _hugetlb_id in
the same bits as ->mapping. It continues to leave ->private unused
on 64-bit. I think this does everything we want?
_entire_mapcount is (still) used for hugetlb folios.
Oh, duh, of course it is. I thought we used page[0].mapcount for them,
but we don't and shouldn't. I suppose we could use a magic value for
page[0].mapcount to indicate hugetlb, but that'd make page_mapcount()
more complex.
With the total mapcount in place, I was thinking about renaming it to
"_pmd_mapcount" and stop using it for hugetlb folios, just like we'd not be
using _nr_pages_mapped for hugetlb folios.
[I also thought about moving the _pmd_mapcount to another subpage, where
we'd also have a _pud_mapcount in the future; but again, stuff for the
future]
Until then, wouldn't _hugetlb_id be problematic here? [I could move
_entire_mapcount/_pmd_mapcount later I guess]
New idea then, how about simply:
unsigned long _flags_1;
unsigned long _head_1;
- unsigned long _folio_avail;
+ unsigned long _hugetlb_id;
We have to check the various other users of struct page to see what we
might conflict with.
Well, compared to the current version there is no change for me: the
space I wanted to use for the total_mapcount is gone and I'll have to
start being creative :)
Free space in subpage1: "The LORD gave and the LORD has taken away"
We seem to be running again into space issues in subpage1 (also, more
relevant now that we will support order-1 folios ...). This kind of
wastage+over-complicated layout (again :( ) just to handle hugetlb
oddities for free pages -- refcount 0 -- feels very wrong.
Stupid idea: Do we really have to identify (possibly free) hugetlb
folios that way from lockless pfnwalkers without a folio reference?
I mean, with a folio reference held it's all working as expected.
Couldn't we just make hugetlb store them in some kind of xarray that we
can walk (using RCU?), indexed by start PFN we want to test?
So if we find the current hugetlb flag to be set on the lockless path,
simply check that xarray. It's all super racy either way, we can get a
free+split+reuse anytime.
Or is that completely flawed?
--
Cheers,
David / dhildenb