Re: [PATCH] mm: fix struct page layout on 32-bit systems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 17, 2021 at 03:18:38PM -0700, Linus Torvalds wrote:
> On Sun, May 16, 2021 at 7:42 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > Ah, if you just put one dummy word in front, then dma_addr_t overlaps with
> > page->mapping, which used to be fine, but now we can map network queues
> > to userspace, page->mapping has to be NULL.
> 
> Well, that's a problem in itself. We shouldn't have those kinds of
> "oh, it uses one field from one union, and another field from
> another".

No, I agree.  I wasn't aware of that particular problem until recently,
and adjusting the contents of struct page always makes me a little
nervous, so I haven't done it yet.

> At least that "low bit of the first word" thing is documented, but
> this kind of "oh, it uses dma_addr from one union and mapping from
> another" really needs to go away. Because that's just not acceptable.
> 
> So that network stack thing should just make the mapping thing explicit then.

One of the pending patches for next merge window has this:

                struct {        /* page_pool used by netstack */
-                       /**
-                        * @dma_addr: might require a 64-bit value on
-                        * 32-bit architectures.
-                        */
+                       unsigned long pp_magic;
+                       struct page_pool *pp;
+                       unsigned long _pp_mapping_pad;
                        unsigned long dma_addr[2];
                };

(people are still bikeshedding the comments, etc, but fundamentally this
is the new struct layout).  It's not the networking stack that uses
page->mapping, it's if somebody decides to do something like call
get_user_pages() on a mapped page_pool page, which then calls
page_mapping() ... if that doesn't return NULL, then things go wrong.

Andrey Ryabinin found the path:
: Some implementation of the flush_dcache_page(), also set_page_dirty()
: can be called on userspace-mapped vmalloc pages during unmap -
: zap_pte_range() -> set_page_dirty()

> Possibly by making mapping/index be the first fields (for both the
> page cache and the page_pool thing) and then have the LRU list and the
> dma_addr be after that?

Unfortunately, we also use the bottom two bits of ->mapping for PageAnon
/ PageKSM / PageMovable.  We could move ->mapping to the fifth word of
the union, where it's less in the way.

> > While I've got you on the subject of compound_head ... have you had a look
> > at the folio work?  It decreases the number of calls to compound_head()
> > by about 25%, as well as shrinking the (compiled size) of the kernel and
> > laying the groundwork for supporting things like 32kB anonymous pages
> > and adaptive page sizes in the page cache.  Andrew's a bit nervous of
> > it, probably because it's such a large change.
> 
> I guess I need to take a look. Mind sending me another pointer to the series?

This is probably a good place to start:
https://lore.kernel.org/lkml/20210505150628.111735-10-willy@xxxxxxxxxxxxx/
or if you'd rather look at a git tree,
https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux