Re: [LSF/MM/BPF TOPIC] Generic page write protection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/21/20 6:32 PM, jglisse@xxxxxxxxxx wrote:
> From: Jérôme Glisse <jglisse@xxxxxxxxxx>
> 
> 
> Provide a generic way to write protect page (à la KSM) to enable new mm
> optimization:

Hi Jerome, 

I am very interested in this feature and discussion. Thanks for posting
this topic.


>     - KSM (kernel share memory) to deduplicate pages (for file
>       back pages too not only anonymous memory like today)
>     - page duplication NUMA (read only duplication) in multiple
>       different physical page. For instance share library code
>       having a copy on each NUMA node. Or in case like GPU/FPGA
>       duplicating memory read only inside the local device memory.


And also, for the benefit of non-GPU-centric folks, let me add that
something like this is required in order to do GPU atomic operations
to system memory, in support of OpenCL Compute (as opposed to Graphics)
atomic ops.

GPUs can use both read duplication and atomics to great effect. It's 
something we've wanted for a while now.

A bit more below:


>     ...
> 
> Note that this write protection is intend to be broken at anytime in
> reasonable time (like KSM today) so that we never block more than
> necessary anything that need to write to the page.
> 
> 
> The goal is to provide a mechanism that work for both anonymous and
> file back memory. For this we need to a pointer inside struct page.
> For anonymous memory KSM uses the anon_vma field which correspond
> to mapping field for file back pages.
> 
> So to allow generic write protection for file back pages we need to
> avoid relying on struct page mapping field in the various kernel code
> path that do use it today.
> 
> The page->mapping fields is use in 5 different ways:
>  [1]- Functions operating on file, we can get the mapping from the file
>       (issue here is that we might need to pass the file down the call-
>       stack)
> 
>  [2]- Core/arch mm functions, those do not care about the file (if they
>       do then it means they are vma related and we can get the mapping
>       from the vma). Those functions only want to be able to walk all
>       the pte point to the page (for instance memory compaction, memory
>       reclaim, ...). We can provide the exact same functionality for
>       write protected pages (like KSM does today).
> 
>  [3]- Block layer when I/O fails. This depends on fs, for instance for
>       fs which uses buffer_head we can update buffer_head to store the
>       mapping instead of the block_device as we can get the block_device
>       from the mapping but not the mapping from the block_device.
> 
>       So solving this is mostly filesystem specific but i have not seen
>       any fs that could not be updated properly so that block layer can
>       report I/O failures without relying on page->mapping
> 
>  [4]- Debugging (mostly procfs/sysfs files to dump memory states). Those
>       do not need the mapping per say, we just need to report page states
>       (and thus write protection information if page is write protected).
> 
>  [5]- GUP (get user page) if something calls GUP in write mode then we
>       need to break write protection (like KSM today). GUPed page should
>       not be write protected as we do not know what the GUPers is doing
>       with the page.
> 

Yes, this is a reasonable constraint. It's a lot harder to make the page
globally write-protected against *everything* (physically-addressed pages
from a non-CPU device included), and providing write protection at the
virtual address level is not quite as difficult. And it will still provide
most of what we'd want.

If a programmer sets up memory to get gup-pinned, and also wants to do
OpenCL atomics to it, we're going to have to say that's just not supported
this year. But it's still a major new capability and the constraint is
not hard to explain.


thanks,
-- 
John Hubbard
NVIDIA

> 
> Most of the patchset deals with [1], [2] and [3] ([4] and [5] are mostly
> trivial).
> 
> For [1] we only need to pass down the mapping to all fs and vfs callback
> functions (this is mostly achieve with coccinelle). Roughly speaking the
> patches are generated with following pseudo code:
> 
> add_mapping_parameter(func)
> {
>     function_add_parameter(func, mapping);
> 
>     for_each_function_calling (caller, func) {
>         calling_add_parameter(caller, func, mapping);
> 
>         if (function_parameters_contains(caller, mapping|file))
>             continue;
> 
>         add_mapping_parameter(caller);
>     }
> }
> 
> passdown_mapping()
> {
>     for_each_function_in_fs (func, fs_functions) {
>         if (!function_body_contains(func, page->mapping))
>             continue;
> 
>         if (function_parameters_contains(func, mapping|file))
>             continue;
> 
>         add_mapping_parameter(func);
>     }
> }
> 
> For [2] KSM is generalized and extended so that both anonymous and file
> back pages can be handled by a common write protected page case.
> 
> For [3] it depends on the filesystem (fs which uses buffer_head are
> easily handled by storing mapping into the buffer_head struct).
> 
> 
> To avoid any regression risks the page->mapping field is left intact as
> today for non write protect pages. This means that if you do not use the
> page write protection mechanism then it can not regress. This is achieve
> by using an helper function that take the mapping from the context
> (current function parameter, see above on how function are updated) and
> the struct page. If the page is not write protected then it uses the
> mapping from the struct page (just like today). The only difference
> between before and after the patchset is that all fs functions that do
> need the mapping for a page now also do get it as a parameter but only
> use the parameter mapping pointer if the page is write protected.
> 
> Note also that i do not believe that once confidence is high that we
> always passdown the correct mapping down each callstack, it does not
> mean we will be able to get rid of the struct page mapping field.
> 
> I posted patchset before [*1] and i intend to post an updated patchset
> before LSF/MM/BPF. I also talked about this at LSF/MM 2018. I still
> believe this will a topic that warrent a discussion with FS/MM and
> block device folks.
> 
> 
> [*1] https://lwn.net/Articles/751050/
>      https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc
> [*2] https://lwn.net/Articles/752564/
> 
> 
> To: lsf-pc@xxxxxxxxxxxxxxxxxxxxxxxxxx
> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Cc: linux-fsdevel@xxxxxxxxxxxxxxx
> Cc: linux-block@xxxxxxxxxxxxxxx
> Cc: linux-mm@xxxxxxxxx
> 
> 





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux