On Wed, Oct 07, 2020 at 11:09:16PM +0100, Matthew Wilcox wrote: > On Wed, Oct 07, 2020 at 01:54:19PM -0400, Jerome Glisse wrote: > > > For other things (NUMA distribution), we can point to something which > > > isn't a struct page and can be distiguished from a real struct page by a > > > bit somewhere (I have ideas for at least three bits in struct page that > > > could be used for this). Then use a pointer in that data structure to > > > point to the real page. Or do NUMA distribution at the inode level. > > > Have a way to get from (inode, node) to an address_space which contains > > > just regular pages. > > > > How do you find all the copies ? KSM maintains a list for a reasons. > > Same would be needed here because if you want to break the write prot > > you need to find all the copy first. If you intend to walk page table > > then how do you synchronize to avoid more copy to spawn while you > > walk reverse mapping, we could lock the struct page i guess. Also how > > do you walk device page table which are completely hidden from core mm. > > So ... why don't you put a PageKsm page in the page cache? That way you > can share code with the current KSM implementation. You'd need > something like this: I do just that but there is no need to change anything in page cache. So below code is not necessary. What you need is a way to find all the copies so if you have a write fault (or any write access) then from that fault you get the mapping and offset and you use that to lookup the fs specific informations and de-duplicate the page with new page and the fs specific informations. Hence the filesystem code do not need to know anything it all happens in generic common code. So flow is: Same as before: 1 - write fault (address, vma) 2 - regular write fault handler -> find page in page cache New to common page fault code: 3 - ksm check in write fault common code (same as ksm today for anonymous page fault code path). 4 - break ksm (address, vma) -> (file offset, mapping) 4.a - use mapping and file offset to lookup the proper fs specific information that were save when the page was made ksm. 4.b - allocate new page and initialize it with that information (and page content), update page cache and mappings ie all the pte who where pointing to the ksm for that mapping at that offset to now use the new page (like KSM for anonymous page today). Resume regular code path: mkwrite /|| set pte ... Roughly the same for write ioctl (other cases goes through GUP which itself goes through page fault code path). There is no need to change page cache in anyway. Just common code path that enable write to file back page. The fs specific information is page->private, some of the flags (page->flags) and page->indexi (file offset). Everytime a page is deduplicated a copy of that information is save in an alias struct which you can get to from the the share KSM page (page-> mapping is a pointer to ksm root struct which has a pointer to list of all aliases). > > +++ b/mm/filemap.c > @@ -1622,6 +1622,9 @@ struct page *find_lock_entry(struct address_space *mapping > , pgoff_t index) > lock_page(page); > /* Has the page been truncated? */ > if (unlikely(page->mapping != mapping)) { > + if (PageKsm(page)) { > + ... > + } > unlock_page(page); > put_page(page); > goto repeat; > @@ -1655,6 +1658,7 @@ struct page *find_lock_entry(struct address_space *mapping, pgoff_t index) > * * %FGP_WRITE - The page will be written > * * %FGP_NOFS - __GFP_FS will get cleared in gfp mask > * * %FGP_NOWAIT - Don't get blocked by page lock > + * * %FGP_KSM - Return KSM pages > * > * If %FGP_LOCK or %FGP_CREAT are specified then the function may sleep even > * if the %GFP flags specified for %FGP_CREAT are atomic. > @@ -1687,6 +1691,11 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index, > > /* Has the page been truncated? */ > if (unlikely(page->mapping != mapping)) { > + if (PageKsm(page) { > + if (fgp_flags & FGP_KSM) > + return page; > + ... > + } > unlock_page(page); > put_page(page); > goto repeat; > > I don't know what you want to do when you find a KSM page, so I just left > an ellipsis. >