On Fri, Aug 12, 2022 at 02:34:53PM +0100, Matthew Wilcox wrote: > On Fri, Aug 12, 2022 at 01:16:39PM +0300, Kirill A. Shutemov wrote: > > On Thu, Aug 11, 2022 at 10:31:21PM +0100, Matthew Wilcox wrote: > > > ============================== > > > State Of The Page, August 2022 > > > ============================== > > > > > > I thought I'd write down where we are with struct page and where > > > we're going, just to make sure we're all (still?) pulling in a similar > > > direction. > > > > > > Destination > > > =========== > > > > > > For some users, the size of struct page is simply too large. At 64 > > > bytes per 4KiB page, memmap occupies 1.6% of memory. If we can get > > > struct page down to an 8 byte tagged pointer, it will be 0.2% of memory, > > > which is an acceptable overhead. > > > > Right. This is attractive. But it brings cost of indirection. > > It does, but it also crams 8 pages into a single cacheline instead of > occupying one cacheline per page. If you really need info about these pages and reference their memdesc it is likely be 9 cache lines that scattered across memory instead of 8 cache lines next to each other in the same page. And it's going to be two cachelines instead of one if we need info about one page. I think it is the most common case. Initially, I thought we can offset the cost by caching memdescs instead of struct page/folio. Like page cache store memdesc, but it would require memdesc_to_pfn() which is not possible, unless we want to store pfn explicitly in memdesc. I don't want to be buzzkill, I like the idea a lot, but abstractions are often costly. Getting it upstream without noticeable performance regressions going to be a challenge. > > It can be especially painful for physical memory scanning. I guess we can > > derive some info from memdesc type itself, like if it can be movable. But > > still looks like an expensive change. > > I just don't think of physical memory scanning as something we do > often, or in a performance-sensitive path. I'm OK with slowing down > kcompactd if it makes walking the LRU list faster. > > > Do you have any estimation on how much CPU time we will pay to reduce > > memory (and cache) overhead? RAM size tend to grow faster than IPC. > > We need to make sure it is the right direction. > > I don't. I've heard colourful metaphors from the hyperscale crowd about > how many more VMs they could sell, usually in terms of putting pallets > of money in the parking lot and setting them on fire. But IPC isn't the > right metric either, CPU performance is all about cache misses these days. As I said above, I don't expect the new scheme to be cache-friendly either. -- Kiryl Shutsemau / Kirill A. Shutemov