On Sun, Jan 21, 2024 at 5:18 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Sun, Jan 21, 2024 at 06:39:26PM -0500, Pasha Tatashin wrote: > > On Wed, May 10, 2023 at 12:28 PM Kent Overstreet > > <kent.overstreet@xxxxxxxxx> wrote: > > > Hasn't been addressed yet, but we were just talking about moving the > > > codetag pointer from page_ext to page last night for memory overhead > > > reasons. > > > > > > The disadvantage then is that the memory overhead doesn't go down if you > > > disable memory allocation profiling at boot time... > > > > > > But perhaps the performance overhead is low enough now that this is not > > > something we expect to be doing as much? > > > > > > Choices, choices... > > > > I would like to participate in this discussion, specifically to > > Umm, this is a discussion proposal for last year, not this. I don't > remember if a followup discussion has been proposed for this year? My bad. I should submit a proposal for followup discussion for this year. Will do that this coming week. > > > 2. Reducing the memory overhead by not using page_ext pointer, but > > instead use n-bits in the page->flags. > > > > The number of buckets is actually not that large, there is no need to > > keep 8-byte pointer in page_ext, it could be an idx in an array of a > > specific size. There could be buckets that contain several stacks. > > There are a lot of people using "n bits in page->flags" and I don't > have a good feeling for how many we really have left. MGLRU uses a > variable number of bits. There's PG_arch_2 and PG_arch_3. There's > PG_uncached. There's PG_young and PG_idle. And of course we have > NUMA node (10 bits?), section (?), zone (3 bits?) I count 28 bits > allocated with all the CONFIG enabled, then 13 for node+zone, so it > certainly seems like there's a lot free on 64-bit, but it'd be > nice to have it written out properly. > > Related, what do we think is going to happen with page_ext in a memdesc > world (also what's going to happen with the kmsan goop in struct page?) > > I see page_idle_ops, page_owner_ops and page_table_check_ops. > page_idle_ops only uses the 8 byte flags. page_owner_ops uses an extra > 64 bytes (!). page_table_check uses an extra 8 bytes. > > page_idle looks to be for folios only. page_table_check seems like > it should be folded into pgdesc. page_owner maybe gets added to every > allocation rather than every page (but that's going to be interesting > for memdescs which don't normally need an allocation). > > That seems to imply that we can get rid of page_ext entirely, which will > be nice. I don't understand kmsan well enough to understand what to > do about it. If it's per-allocation, we can handle it like page_owner. > If it really is per-page, we can make it an ifdef in struct page itself. > I think it's OK to grow struct page for such a rarely used debugging > option.