Re: How to use huge pages in drivers?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Matthew Wilcox wrote on Thu, Sep 05, 2019:
> On Thu, Sep 05, 2019 at 05:44:00PM +0200, Dominique Martinet wrote:
> > Question though - is it ok to insert small pages if the huge_fault
> > handler is called with PE_SIZE_PMD ?
> > (I think the pte insertion will automatically create the pmd, but would
> > be good to confirm)
> 
> No, you need to return VM_FAULT_FALLBACK, at which point the generic code
> will create a PMD for you and then call your ->fault handler which can
> insert PTEs.

Hmm, that's a shame actually.
There is a rather costly round-trip between linux and mckernel to
determine what page size is used for this virtual address on the remote
side and to get the corresponding physical address, so basically when we
get the fault we do know know if this will be a PMD or PTE. 

I'd rather avoid having to do one round-trip at the PMD stage, get told
this is a PTE, temporarily give up and wait to be called again with
PE_SIZE_PTE and do a second round-trip in this case.
I didn't see anywhere in the vm_fault struct that I could piggy-back to
remember something from the previous call, and I'm pretty sure it would
be a bad idea to use the vma's vm_private_data here because there could
be multiple faults in parallel on other threads.


Looking at vmf_insert_pfn(), it will allocate a pmd because of
insert_pfn's get_locked_pte, so it does end up working (I never return a
page - we always return VM_FAULT_NOPAGE on success, so I do not see the
harm in doing it early if we can)

Following the code in __handle_vm_fault assuming the pmd fault would
have returned fallback I do not see any harm here - the pmd actually
already has been allocated here (at pmd level fault), it's just set to
none.

Not exactly pretty, though, and very definitely no guarantee it'll keep
working... I'll stick a comment saying what we should do at least :P

> It works the same way from PUDs to PMDs by the way, in case you ever
> have a 1GB mapping ;-)

Yes, already returning fallback in this case - but I'm just assuming
that won't happen so no round-trip here :)


> > Now that I've set it as dax I think it actually makes sense as in
> > "there's memory here that points to something linux no longer manages
> > directly, just let it be" and we might benefit from the other exceptions
> > dax have, I'll need to look at what this implies in more details...
> 
> I think that should be fine, but I don't really know RHEL 7.3 all that
> well ;-)

Good enough for me, tests will tell me what I broke :)


> No problem ... these APIs are relatively new and not necessarily all
> that intuitive.

Looking at a recent vanilla linux on evening and rhel's kernel at work
didn't help on my side (some fun differences like the VM_HUGE_FAULT flag
in the vma, but now I understand it was added for abi compatibility it
does make sense after I found about it - on an older module the function
could just have been left uninitialized and thus non-null yet not valid)

Definitely did help to point at huge_fault() again.


Thanks,
-- 
Dominique




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux