Re: [RFC 1/7] mm: Add new vma flag VM_LOCAL_CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 13, 2018 at 7:56 PM, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> On Tue, Mar 13, 2018 at 07:15:46PM +0200, Boaz Harrosh wrote:
>> On a call to mmap an mmap provider (like an FS) can put
>> this flag on vma->vm_flags.
>>
>> This tells the Kernel that the vma will be used from a single
>> core only and therefore invalidation of PTE(s) need not a
>> wide CPU scheduling
>>
>> The motivation of this flag is the ZUFS project where we want
>> to optimally map user-application buffers into a user-mode-server
>> execute the operation and efficiently unmap.
>
> I've been looking at something similar, and I prefer my approach,
> although I'm not nearly as far along with my implementation as you are.
>
> My approach is also to add a vm_flags bit, tentatively called VM_NOTLB.
> The page fault handler refuses to insert any TLB entries into the process
> address space.  But follow_page_mask() will return the appropriate struct
> page for it.  This should be enough for O_DIRECT accesses to work as
> you'll get the appropriate scatterlists built.
>
> I suspect Boaz has already done a lot of thinking about this and doesn't
> need the explanation, but here's how it looks for anyone following along
> at home:
>
> Process A calls read().
> Kernel allocates a page cache page for it and calls the filesystem through
>   ->readpages (or ->readpage).
> Filesystem calls the managing process to get the data for that page.
> Managing process draws a pentagram and summons Beelzebub (or runs Perl;
>   whichever you find more scary).
> Managing process notifies the filesystem that the page is now full of data.
> Filesystem marks the page as being Uptodate and unlocks it.
> Process was waiting on the page lock, wakes up and copies the data from the
>   page cache into userspace.  read() is complete.
>
> What we're concerned about here is what to do after the managing process
> tells the kernel that the read is complete.  Clearly allowing the managing
> process continued access to the page is Bad as the page may be freed by the
> page cache and then reused for something else.  Doing a TLB shootdown is
> expensive.  So Boaz's approach is to have the process promise that it won't
> have any other thread look at it.  My approach is to never allow the page
> to have load/store access from userspace; it can only be passed to other
> system calls.

This all seems to revolve around the fact that userspace fs server
process needs to copy something into userspace client's buffer, right?

Instead of playing with memory mappings, why not just tell the kernel
*what* to copy?

While in theory not as generic, I don't see any real limitations (you
don't actually need the current contents of the buffer in the read
case and vica verse in the write case).

And we already have an interface for this: splice(2).  What am I
missing?  What's the killer argument in favor of the above messing
with tlb caches etc, instead of just letting the kernel do the dirty
work.

Thanks,
Miklos



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux