On Wed, Jan 22, 2020 at 12:28:39PM +0800, Gao Xiang wrote: > Hi J�r�me, > > On Tue, Jan 21, 2020 at 06:32:22PM -0800, jglisse@xxxxxxxxxx wrote: > > From: J�r�me Glisse <jglisse@xxxxxxxxxx> > > > > > > <snip> > > > > > To avoid any regression risks the page->mapping field is left intact as > > today for non write protect pages. This means that if you do not use the > > page write protection mechanism then it can not regress. This is achieve > > by using an helper function that take the mapping from the context > > (current function parameter, see above on how function are updated) and > > the struct page. If the page is not write protected then it uses the > > mapping from the struct page (just like today). The only difference > > between before and after the patchset is that all fs functions that do > > need the mapping for a page now also do get it as a parameter but only > > use the parameter mapping pointer if the page is write protected. > > > > Note also that i do not believe that once confidence is high that we > > always passdown the correct mapping down each callstack, it does not > > mean we will be able to get rid of the struct page mapping field. > > This feature is awesome and I might have some premature words here... > > In short, are you suggesting completely getting rid of all way to access > mapping directly from struct page (other than by page->private or something > else like calling trace)? No, all access to page->mapping are replace by: struct address_space *fs_page_mapping(struct page *page, struct address_space *mapping) { if (unlikely(!PageIsWriteProtected(page))) return page->mapping; return mapping; } All function that where doing direct dereference are updated to use this helper. If the function already has mapping in its context then it is easy (there is a lot of place like that because you have file or inode or mapping available from the function context). If function does not have file, inode or mapping in its context then a new mapping parameter is added to that function and all call site are updated (and this does recurse ie if call site do not have file,inode or mapping then a mapping parameter is added to them too ...). This takes care of all fs code. The mm code is split between code that deal with vma where we can get the mapping from the vma and mm code that just want to walk all the CPU pte pointing to the page. In this latter case we just need to provide CPU pte walkers for write protected pages (like KSM does today). The block device code only need the mapping on io error and they are different strategy depending on individual fs. fs using buffer_head can easily be updated. For other they are different solution and they can be updated one at a time with tailor solution. > I'm not sure if all cases can be handled without page->mapping easily (or > handled effectively) since mapping field could also be used to indicate/judge > truncated pages or some other filesystem specific states (okay, I think there > could be some replacement, but it seems a huge project...) I forgot to talk about truncate, all place that test for truncate are updated to: bool fs_page_is_truncated(struct page *page, struct address_space *mapping) { if (unlikely(!PageIsWriteProtected(page))) return !page->mapping || mapping != page->mapping; return wp_page_is_protected(page, mapping); } Where wp_page_is_protected() will use common write protect mm code (look at mm/ksm.c as it will be mostly that) to determine if the page have been truncated. Also code doing truncation will have to special case write protected page but that's easy enough. > Currently, page->private is a per-page user-defined field, yet I don't think > it could always be used as a pointer pointing to some structure. It can be > simply used to store some unsigned long values for some kinds of filesystem > pages as well... For fs that use buffer_head i change buffer_head struct to store mapping and not block_device. For other fs it will depend on the individual fs but i am not changing page->private, i might only change the struct that page->private points to for that specific fs. > > It might some ineffective to convert such above usage to individual per-page > structure pointers --- from cacheline or extra memory overhead view... > > So I think at least there could be some another way to get its content > source (inode or sub-inode granularity, a reverse way) effectively... > by some field in struct page directly or indirectly... > > I agree that the usage of page->mapping field is complicated for now. > I'm looking forward some unique way to mark the page type for a filesystem > to use (inode or fs internal special pages) or even extend to analymous > pages [1]. However, it seems a huge project to keep from some regression... Note that page->mapping stays _untouch_ if page is not write protected so there is no memory lookup overhead, the only overhead is the extra branch to test if the page is write protected or not. So if you do not use the write protection feature then you can not regress ie page->mapping is untouch and that's what get use like it is today. So it can not regress unless i do stupid mistake, but that's what review is for ;)). > > I'm interested in related stuffs, some conclusion and I saw the article of > LSF/MM 2018 although my English isn't good... > > If something wrong, please kindly point out... > > [1] https://lore.kernel.org/r/20191030172234.GA7018@hsiangkao-HP-ZHAN-66-Pro-G1 Missed that thread thank you for the pointer, i have some reading to do :) Cheers, Jérôme Glisse