Re: [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 18, 2021 at 10:28 AM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote:
>
> On 5/5/21 11:36 PM, Joao Martins wrote:
> > On 5/5/21 11:20 PM, Dan Williams wrote:
> >> On Wed, May 5, 2021 at 12:50 PM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote:
> >>> On 5/5/21 7:44 PM, Dan Williams wrote:
> >>>> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote:
> >>>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> >>>>> index b46f63dcaed3..bb28d82dda5e 100644
> >>>>> --- a/include/linux/memremap.h
> >>>>> +++ b/include/linux/memremap.h
> >>>>> @@ -114,6 +114,7 @@ struct dev_pagemap {
> >>>>>         struct completion done;
> >>>>>         enum memory_type type;
> >>>>>         unsigned int flags;
> >>>>> +       unsigned long align;
> >>>>
> >>>> I think this wants some kernel-doc above to indicate that non-zero
> >>>> means "use compound pages with tail-page dedup" and zero / PAGE_SIZE
> >>>> means "use non-compound base pages".
>
> [...]
>
> >>>> The non-zero value must be
> >>>> PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE.
> >>>> Hmm, maybe it should be an
> >>>> enum:
> >>>>
> >>>> enum devmap_geometry {
> >>>>     DEVMAP_PTE,
> >>>>     DEVMAP_PMD,
> >>>>     DEVMAP_PUD,
> >>>> }
> >>>>
> >>> I suppose a converter between devmap_geometry and page_size would be needed too? And maybe
> >>> the whole dax/nvdimm align values change meanwhile (as a followup improvement)?
> >>
> >> I think it is ok for dax/nvdimm to continue to maintain their align
> >> value because it should be ok to have 4MB align if the device really
> >> wanted. However, when it goes to map that alignment with
> >> memremap_pages() it can pick a mode. For example, it's already the
> >> case that dax->align == 1GB is mapped with DEVMAP_PTE today, so
> >> they're already separate concepts that can stay separate.
> >>
> > Gotcha.
>
> I am reconsidering part of the above. In general, yes, the meaning of devmap @align
> represents a slightly different variation of the device @align i.e. how the metadata is
> laid out **but** regardless of what kind of page table entries we use vmemmap.
>
> By using DEVMAP_PTE/PMD/PUD we might end up 1) duplicating what nvdimm/dax already
> validates in terms of allowed device @align values (i.e. PAGE_SIZE, PMD_SIZE and PUD_SIZE)
> 2) the geometry of metadata is very much tied to the value we pick to @align at namespace
> provisioning -- not the "align" we might use at mmap() perhaps that's what you referred
> above? -- and 3) the value of geometry actually derives from dax device @align because we
> will need to create compound pages representing a page size of @align value.
>
> Using your example above: you're saying that dax->align == 1G is mapped with DEVMAP_PTEs,
> in reality the vmemmap is populated with PMDs/PUDs page tables (depending on what archs
> decide to do at vmemmap_populate()) and uses base pages as its metadata regardless of what
> device @align. In reality what we want to convey in @geometry is not page table sizes, but
> just the page size used for the vmemmap of the dax device.

Good point, the names "PTE, PMD, PUD" imply the hardware mapping size,
not the software compound page size.

> Additionally, limiting its
> value might not be desirable... if tomorrow Linux for some arch supports dax/nvdimm
> devices with 4M align or 64K align, the value of @geometry will have to reflect the 4M to
> create compound pages of order 10 for the said vmemmap.
>
> I am going to wait until you finish reviewing the remaining four patches of this series,
> but maybe this is a simple misnomer (s/align/geometry/) with a comment but without
> DEVMAP_{PTE,PMD,PUD} enum part? Or perhaps its own struct with a value and enum a
> setter/getter to audit its value? Thoughts?

I do see what you mean about the confusion DEVMAP_{PTE,PMD,PUD}
introduces, but I still think the device-dax align and the
organization of the 'struct page' metadata are distinct concepts. So
I'm happy with any color of the bikeshed as long as the 2 concepts are
distinct. How about calling it  "compound_page_order"? Open to other
ideas...




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux