Re: New layout for struct page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 10, 2017 at 10:37:53PM -0800, Matthew Wilcox wrote:
> On Thu, Dec 07, 2017 at 05:31:39PM -0800, Matthew Wilcox wrote:
> > Dave Hansen and I talked about this a while ago.  I was trying to
> > understand something in the slab allocator today and thought I'd have
> > another crack at it.  I also documented my understanding of what the
> > rules are for using struct page.
> 
> I kept going with this and ended up with something that's maybe more
> interesting -- a new layout for struct page.
> 
> Advantages:
>  - Simpler struct definitions
>  - Compound pages may now be allocated of order 1 (currently, tail pages
>    1 and 2 both contain information).

That's neat. Except it doesn't work. See below. :-/

>  - page_deferred_list is now really defined in the struct instead of only
>    in comments and code.
>  - page_deferred_list doesn't conflict with tail2->index, which would
>    cause problems putting it in the page cache.  Actually, I don't see
>    how shmem_add_to_page_cache of a transhuge page doesn't provoke a
>    BUG in filemap_fault()?  VM_BUG_ON_PAGE(page->index != offset, page)
>    ought to trigger.

filemap_fault() doesn't see THP yet. shmem/tmpfs uses own ->fault handler.

> Disadvantages
>  - If adding a new variation to struct page, harder to tell where refcount
>    and compound_head land in your struct.

Yeah, that's a bummer.

It was tricky to find right spot for compound_head. And it would be even
more harder if we had struct page from proposed format.

>  - Need to remember that 'flags' is defined in the top level 'struct page'
>    and not in any of the layouts.
>    - Can do a variant of this with flags explicitly in each layout if
>      preferred.
>  - Need to explicitly define padding in layouts.
> 
> I haven't changed any code yet.  I wanted to get feedback from Christoph
> and Kirill before going further.
> 
> The new layout keeps struct page the same size as it is currently.  Mostly
> the only things that have changed are compound pages.  slab has not changed
> layout at all.
> 
> In the two tables below, the first column is the starting byte of the
> named element.  The next three columns are after the patch, and the last
> two are before the patch.  The annotation (1) means this field only has
> that meaning in the first tail page; the other fields are used in all
> tail pages.  The head page of a compound page uses all the fields the
> same way as a non-compound page.
> 
> ---+------------+------------------------------------+-------------------+
>  B | slab       | page cache | tail pages            | old tail          |
> ---+------------+------------------------------------+-------------------+
>  0 |                flags                            |                   |
>  4 |                  "                              |                   |
>  8 | s_mem      |          index                     | compound_mapcount |
> 12 | "          |            "                       | --                |
> 16 | freelist   | mapping    | dtor / order (1)      |                   |
> 20 | "          | "          | --                    |                   |
> 24 | counters   | mapcount   | compound_mapcount (1) | --                |

Sorry, this is not going to work: we need mapcount in all subpages of THP
as they can be mapped with PTE individually. So in first tail pages we
need find a spot form both compound_mapcount and mapcount.

> 28 | "          | refcount   | --                    | --                |
> 32 | next       | lru        | compound_head         | compound_head     |
> 36 | "          | "          | "                     | "                 |
> 40 | pages      | "          | deferred_list (1)     | dtor              |
> 44 | pobjects   | "          | "                     | order             |
> 48 | slab_cache | private    | "                     | --                |
> 52 | "          | "          | "                     | --                |
> ---+------------+------------+-----------------------+-------------------+
> 
> ---+------------+--------------------------------+
>  B | slab       | page cache | compound tail     |
> ---+------------+--------------------------------+
>  0 |                flags                        |
>  4 | s_mem      |          index                 |
>  8 | freelist   | mapping    | dtor/ order       |
> 12 | counters   | mapcount   | compound_mapcount |
> 16 | --         | refcount   | --                |
> 20 | next       | lru        | compound_head     |
> 24 | pg/pobj    | "          | deferred_list     |
> 28 | slab_cache | private    | "                 |
> ---+------------+------------+-------------------+

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux