On Mon, May 17, 2021 at 03:18:38PM -0700, Linus Torvalds wrote: > On Sun, May 16, 2021 at 7:42 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > Ah, if you just put one dummy word in front, then dma_addr_t overlaps with > > page->mapping, which used to be fine, but now we can map network queues > > to userspace, page->mapping has to be NULL. > > Well, that's a problem in itself. We shouldn't have those kinds of > "oh, it uses one field from one union, and another field from > another". No, I agree. I wasn't aware of that particular problem until recently, and adjusting the contents of struct page always makes me a little nervous, so I haven't done it yet. > At least that "low bit of the first word" thing is documented, but > this kind of "oh, it uses dma_addr from one union and mapping from > another" really needs to go away. Because that's just not acceptable. > > So that network stack thing should just make the mapping thing explicit then. One of the pending patches for next merge window has this: struct { /* page_pool used by netstack */ - /** - * @dma_addr: might require a 64-bit value on - * 32-bit architectures. - */ + unsigned long pp_magic; + struct page_pool *pp; + unsigned long _pp_mapping_pad; unsigned long dma_addr[2]; }; (people are still bikeshedding the comments, etc, but fundamentally this is the new struct layout). It's not the networking stack that uses page->mapping, it's if somebody decides to do something like call get_user_pages() on a mapped page_pool page, which then calls page_mapping() ... if that doesn't return NULL, then things go wrong. Andrey Ryabinin found the path: : Some implementation of the flush_dcache_page(), also set_page_dirty() : can be called on userspace-mapped vmalloc pages during unmap - : zap_pte_range() -> set_page_dirty() > Possibly by making mapping/index be the first fields (for both the > page cache and the page_pool thing) and then have the LRU list and the > dma_addr be after that? Unfortunately, we also use the bottom two bits of ->mapping for PageAnon / PageKSM / PageMovable. We could move ->mapping to the fifth word of the union, where it's less in the way. > > While I've got you on the subject of compound_head ... have you had a look > > at the folio work? It decreases the number of calls to compound_head() > > by about 25%, as well as shrinking the (compiled size) of the kernel and > > laying the groundwork for supporting things like 32kB anonymous pages > > and adaptive page sizes in the page cache. Andrew's a bit nervous of > > it, probably because it's such a large change. > > I guess I need to take a look. Mind sending me another pointer to the series? This is probably a good place to start: https://lore.kernel.org/lkml/20210505150628.111735-10-willy@xxxxxxxxxxxxx/ or if you'd rather look at a git tree, https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio