From: Jérôme Glisse <jglisse@xxxxxxxxxx> Provide a generic way to write protect page (à la KSM) to enable new mm optimization: - KSM (kernel share memory) to deduplicate pages (for file back pages too not only anonymous memory like today) - page duplication NUMA (read only duplication) in multiple different physical page. For instance share library code having a copy on each NUMA node. Or in case like GPU/FPGA duplicating memory read only inside the local device memory. ... Note that this write protection is intend to be broken at anytime in reasonable time (like KSM today) so that we never block more than necessary anything that need to write to the page. The goal is to provide a mechanism that work for both anonymous and file back memory. For this we need to a pointer inside struct page. For anonymous memory KSM uses the anon_vma field which correspond to mapping field for file back pages. So to allow generic write protection for file back pages we need to avoid relying on struct page mapping field in the various kernel code path that do use it today. The page->mapping fields is use in 5 different ways: [1]- Functions operating on file, we can get the mapping from the file (issue here is that we might need to pass the file down the call- stack) [2]- Core/arch mm functions, those do not care about the file (if they do then it means they are vma related and we can get the mapping from the vma). Those functions only want to be able to walk all the pte point to the page (for instance memory compaction, memory reclaim, ...). We can provide the exact same functionality for write protected pages (like KSM does today). [3]- Block layer when I/O fails. This depends on fs, for instance for fs which uses buffer_head we can update buffer_head to store the mapping instead of the block_device as we can get the block_device from the mapping but not the mapping from the block_device. So solving this is mostly filesystem specific but i have not seen any fs that could not be updated properly so that block layer can report I/O failures without relying on page->mapping [4]- Debugging (mostly procfs/sysfs files to dump memory states). Those do not need the mapping per say, we just need to report page states (and thus write protection information if page is write protected). [5]- GUP (get user page) if something calls GUP in write mode then we need to break write protection (like KSM today). GUPed page should not be write protected as we do not know what the GUPers is doing with the page. Most of the patchset deals with [1], [2] and [3] ([4] and [5] are mostly trivial). For [1] we only need to pass down the mapping to all fs and vfs callback functions (this is mostly achieve with coccinelle). Roughly speaking the patches are generated with following pseudo code: add_mapping_parameter(func) { function_add_parameter(func, mapping); for_each_function_calling (caller, func) { calling_add_parameter(caller, func, mapping); if (function_parameters_contains(caller, mapping|file)) continue; add_mapping_parameter(caller); } } passdown_mapping() { for_each_function_in_fs (func, fs_functions) { if (!function_body_contains(func, page->mapping)) continue; if (function_parameters_contains(func, mapping|file)) continue; add_mapping_parameter(func); } } For [2] KSM is generalized and extended so that both anonymous and file back pages can be handled by a common write protected page case. For [3] it depends on the filesystem (fs which uses buffer_head are easily handled by storing mapping into the buffer_head struct). To avoid any regression risks the page->mapping field is left intact as today for non write protect pages. This means that if you do not use the page write protection mechanism then it can not regress. This is achieve by using an helper function that take the mapping from the context (current function parameter, see above on how function are updated) and the struct page. If the page is not write protected then it uses the mapping from the struct page (just like today). The only difference between before and after the patchset is that all fs functions that do need the mapping for a page now also do get it as a parameter but only use the parameter mapping pointer if the page is write protected. Note also that i do not believe that once confidence is high that we always passdown the correct mapping down each callstack, it does not mean we will be able to get rid of the struct page mapping field. I posted patchset before [*1] and i intend to post an updated patchset before LSF/MM/BPF. I also talked about this at LSF/MM 2018. I still believe this will a topic that warrent a discussion with FS/MM and block device folks. [*1] https://lwn.net/Articles/751050/ https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc [*2] https://lwn.net/Articles/752564/ To: lsf-pc@xxxxxxxxxxxxxxxxxxxxxxxxxx Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: linux-fsdevel@xxxxxxxxxxxxxxx Cc: linux-block@xxxxxxxxxxxxxxx Cc: linux-mm@xxxxxxxxx