Re: [PATCH v1 10/11] device-dax: compound pagemap support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/2/21 1:36 AM, Dan Williams wrote:
> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote:
>>
>> dax devices are created with a fixed @align (huge page size) which
>> is enforced through as well at mmap() of the device. Faults,
>> consequently happen too at the specified @align specified at the
>> creation, and those don't change through out dax device lifetime.
>> MCEs poisons a whole dax huge page, as well as splits occurring at
>> at the configured page size.
> 
> This paragraph last...
> 
/me nods

>>
>> Use the newly added compound pagemap facility which maps the
>> assigned dax ranges as compound pages at a page size of @align.
>> Currently, this means, that region/namespace bootstrap would take
>> considerably less, given that you would initialize considerably less
>> pages.
> 
> This paragraph should go first...
> 
/me nods

>>
>> On setups with 128G NVDIMMs the initialization with DRAM stored struct pages
>> improves from ~268-358 ms to ~78-100 ms with 2M pages, and to less than
>> a 1msec with 1G pages.
> 
> This paragraph second...
> 
/me nods

> 
> The reason for this ordering is to have increasingly more detail as
> the changelog is read so that people that don't care about the details
> can get the main theme immediately, and others that wonder why
> device-dax is able to support this can read deeper.
> 
>>
>> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
>> ---
>>  drivers/dax/device.c | 58 ++++++++++++++++++++++++++++++++++----------
>>  1 file changed, 45 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
>> index db92573c94e8..e3dcc4ad1727 100644
>> --- a/drivers/dax/device.c
>> +++ b/drivers/dax/device.c
>> @@ -192,6 +192,43 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
>>  }
>>  #endif /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>>
>> +static void set_page_mapping(struct vm_fault *vmf, pfn_t pfn,
>> +                            unsigned long fault_size,
>> +                            struct address_space *f_mapping)
>> +{
>> +       unsigned long i;
>> +       pgoff_t pgoff;
>> +
>> +       pgoff = linear_page_index(vmf->vma, vmf->address
>> +                       & ~(fault_size - 1));
> 
> I know you are just copying this style from whomever wrote it this way
> originally, but that person (me) was wrong this should be:
> 
> pgoff = linear_page_index(vmf->vma, ALIGN(vmf->address, fault_size));
> 
> ...you might do a lead-in cleanup patch before this one.
> 
Yeap, will do.

> 
>> +
>> +       for (i = 0; i < fault_size / PAGE_SIZE; i++) {
>> +               struct page *page;
>> +
>> +               page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
>> +               if (page->mapping)
>> +                       continue;
>> +               page->mapping = f_mapping;
>> +               page->index = pgoff + i;
>> +       }
>> +}
>> +
>> +static void set_compound_mapping(struct vm_fault *vmf, pfn_t pfn,
>> +                                unsigned long fault_size,
>> +                                struct address_space *f_mapping)
>> +{
>> +       struct page *head;
>> +
>> +       head = pfn_to_page(pfn_t_to_pfn(pfn));
>> +       head = compound_head(head);
>> +       if (head->mapping)
>> +               return;
>> +
>> +       head->mapping = f_mapping;
>> +       head->index = linear_page_index(vmf->vma, vmf->address
>> +                       & ~(fault_size - 1));
>> +}
>> +
>>  static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
>>                 enum page_entry_size pe_size)
>>  {
>> @@ -225,8 +262,7 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
>>         }
>>
>>         if (rc == VM_FAULT_NOPAGE) {
>> -               unsigned long i;
>> -               pgoff_t pgoff;
>> +               struct dev_pagemap *pgmap = pfn_t_to_page(pfn)->pgmap;
> 
> The device should already know its pagemap...
> 
> There is a distinction in dev_dax_probe() for "static" vs "dynamic"
> pgmap, but once the pgmap is allocated it should be fine to assign it
> back to dev_dax->pgmap in the "dynamic" case. That could be a lead-in
> patch to make dev_dax->pgmap always valid.
> 
I suppose you mean to always set dev_dax->pgmap at the end of the
'if (!pgmap)' in dev_dax_probe() after we allocate the pgmap.

I will make this a separate cleanup patch as you suggested.

>>
>>                 /*
>>                  * In the device-dax case the only possibility for a
>> @@ -234,17 +270,10 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
>>                  * mapped. No need to consider the zero page, or racing
>>                  * conflicting mappings.
>>                  */
>> -               pgoff = linear_page_index(vmf->vma, vmf->address
>> -                               & ~(fault_size - 1));
>> -               for (i = 0; i < fault_size / PAGE_SIZE; i++) {
>> -                       struct page *page;
>> -
>> -                       page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
>> -                       if (page->mapping)
>> -                               continue;
>> -                       page->mapping = filp->f_mapping;
>> -                       page->index = pgoff + i;
>> -               }
>> +               if (pgmap->align > PAGE_SIZE)
>> +                       set_compound_mapping(vmf, pfn, fault_size, filp->f_mapping);
>> +               else
>> +                       set_page_mapping(vmf, pfn, fault_size, filp->f_mapping);
>>         }
>>         dax_read_unlock(id);
>>
>> @@ -426,6 +455,9 @@ int dev_dax_probe(struct dev_dax *dev_dax)
>>         }
>>
>>         pgmap->type = MEMORY_DEVICE_GENERIC;
>> +       if (dev_dax->align > PAGE_SIZE)
>> +               pgmap->align = dev_dax->align;
> 
> Just needs updates for whatever renames you do for the "compound
> geometry" terminology rather than subtle side effects of "align".
> 
> Other than that, looks good to me.
> 
OK, will do.

Thanks!




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux