Adding: lsf-pc On 1/21/20 6:32 PM, jglisse@xxxxxxxxxx wrote: > From: Jérôme Glisse <jglisse@xxxxxxxxxx> > > > Provide a generic way to write protect page (à la KSM) to enable new mm > optimization: > - KSM (kernel share memory) to deduplicate pages (for file > back pages too not only anonymous memory like today) > - page duplication NUMA (read only duplication) in multiple > different physical page. For instance share library code > having a copy on each NUMA node. Or in case like GPU/FPGA > duplicating memory read only inside the local device memory. > ... > > Note that this write protection is intend to be broken at anytime in > reasonable time (like KSM today) so that we never block more than > necessary anything that need to write to the page. > > > The goal is to provide a mechanism that work for both anonymous and > file back memory. For this we need to a pointer inside struct page. > For anonymous memory KSM uses the anon_vma field which correspond > to mapping field for file back pages. > > So to allow generic write protection for file back pages we need to > avoid relying on struct page mapping field in the various kernel code > path that do use it today. > > The page->mapping fields is use in 5 different ways: > [1]- Functions operating on file, we can get the mapping from the file > (issue here is that we might need to pass the file down the call- > stack) > > [2]- Core/arch mm functions, those do not care about the file (if they > do then it means they are vma related and we can get the mapping > from the vma). Those functions only want to be able to walk all > the pte point to the page (for instance memory compaction, memory > reclaim, ...). We can provide the exact same functionality for > write protected pages (like KSM does today). > > [3]- Block layer when I/O fails. This depends on fs, for instance for > fs which uses buffer_head we can update buffer_head to store the > mapping instead of the block_device as we can get the block_device > from the mapping but not the mapping from the block_device. > > So solving this is mostly filesystem specific but i have not seen > any fs that could not be updated properly so that block layer can > report I/O failures without relying on page->mapping > > [4]- Debugging (mostly procfs/sysfs files to dump memory states). Those > do not need the mapping per say, we just need to report page states > (and thus write protection information if page is write protected). > > [5]- GUP (get user page) if something calls GUP in write mode then we > need to break write protection (like KSM today). GUPed page should > not be write protected as we do not know what the GUPers is doing > with the page. > > > Most of the patchset deals with [1], [2] and [3] ([4] and [5] are mostly > trivial). > > For [1] we only need to pass down the mapping to all fs and vfs callback > functions (this is mostly achieve with coccinelle). Roughly speaking the > patches are generated with following pseudo code: > > add_mapping_parameter(func) > { > function_add_parameter(func, mapping); > > for_each_function_calling (caller, func) { > calling_add_parameter(caller, func, mapping); > > if (function_parameters_contains(caller, mapping|file)) > continue; > > add_mapping_parameter(caller); > } > } > > passdown_mapping() > { > for_each_function_in_fs (func, fs_functions) { > if (!function_body_contains(func, page->mapping)) > continue; > > if (function_parameters_contains(func, mapping|file)) > continue; > > add_mapping_parameter(func); > } > } > > For [2] KSM is generalized and extended so that both anonymous and file > back pages can be handled by a common write protected page case. > > For [3] it depends on the filesystem (fs which uses buffer_head are > easily handled by storing mapping into the buffer_head struct). > > > To avoid any regression risks the page->mapping field is left intact as > today for non write protect pages. This means that if you do not use the > page write protection mechanism then it can not regress. This is achieve > by using an helper function that take the mapping from the context > (current function parameter, see above on how function are updated) and > the struct page. If the page is not write protected then it uses the > mapping from the struct page (just like today). The only difference > between before and after the patchset is that all fs functions that do > need the mapping for a page now also do get it as a parameter but only > use the parameter mapping pointer if the page is write protected. > > Note also that i do not believe that once confidence is high that we > always passdown the correct mapping down each callstack, it does not > mean we will be able to get rid of the struct page mapping field. > > I posted patchset before [*1] and i intend to post an updated patchset > before LSF/MM/BPF. I also talked about this at LSF/MM 2018. I still > believe this will a topic that warrent a discussion with FS/MM and > block device folks. > > > [*1] https://lwn.net/Articles/751050/ > https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc > [*2] https://lwn.net/Articles/752564/ > > > To: lsf-pc@xxxxxxxxxxxxxxxxxxxxxxxxxx > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Cc: linux-fsdevel@xxxxxxxxxxxxxxx > Cc: linux-block@xxxxxxxxxxxxxxx > Cc: linux-mm@xxxxxxxxx > > thanks, -- John Hubbard NVIDIA