On Wed, Oct 07, 2020 at 07:33:16PM +0100, Matthew Wilcox wrote: > On Wed, Oct 07, 2020 at 01:54:19PM -0400, Jerome Glisse wrote: > > On Wed, Oct 07, 2020 at 06:05:58PM +0100, Matthew Wilcox wrote: > > > On Wed, Oct 07, 2020 at 10:48:35AM -0400, Jerome Glisse wrote: > > > > On Wed, Oct 07, 2020 at 04:20:13AM +0100, Matthew Wilcox wrote: > > > > > On Tue, Oct 06, 2020 at 09:05:49PM -0400, jglisse@xxxxxxxxxx wrote: > > > For other things (NUMA distribution), we can point to something which [...] > > > isn't a struct page and can be distiguished from a real struct page by a > > > bit somewhere (I have ideas for at least three bits in struct page that > > > could be used for this). Then use a pointer in that data structure to > > > point to the real page. Or do NUMA distribution at the inode level. > > > Have a way to get from (inode, node) to an address_space which contains > > > just regular pages. > > > > How do you find all the copies ? KSM maintains a list for a reasons. > > Same would be needed here because if you want to break the write prot > > you need to find all the copy first. If you intend to walk page table > > then how do you synchronize to avoid more copy to spawn while you > > walk reverse mapping, we could lock the struct page i guess. Also how > > do you walk device page table which are completely hidden from core mm. > > You have the inode and you iterate over each mapping, looking up the page > that's in each mapping. Or you use the i_mmap tree to find the pages. This would slow down for everyone as we would have to walk all mapping each time we try to write to page. Also we a have mechanism for page write back to avoid race between thread trying to write and write back. We would also need something similar. Without mediating this through struct page i do not see how to keep this reasonable from performance point of view. > > > I don't have time to work on all of these. If there's one that > > > particularly interests you, let's dive deep into it and figure out how > > > > I care about KSM, duplicate NUMA copy (not only for CPU but also > > device) and write protection or exclusive write access. In each case > > you need a list of all the copy (for KSM of the deduplicated page) > > Having a special entry in the page cache does not sound like a good > > option in many code path you would need to re-look the page cache to > > find out if the page is in special state. If you use a bit flag in > > struct page how do you get to the callback or to the copy/alias, > > walk all the page tables ? > > Like I said, something that _looks_ like a struct page. At least looks > enough like a struct page that you can pull a pointer out of the page > cache and check the bit. But since it's not actually a struct page, > you can use the rest of the data structure for pointers to things you > want to track. Like the real struct page. What i fear is the added cost because it means we need to do this look- up everytime to check and we also need proper locking to avoid races. Adding an ancilliary struct and trying to keep everything synchronize seems harder to me. > > > I do not see how i am doing violence to struct page :) The basis of > > my approach is to pass down the mapping. We always have the mapping > > at the top of the stack (either syscall entry point on a file or > > through the vma when working on virtual address). > > Yes, you explained all that in Utah. I wasn't impressed than, and I'm > not impressed now. Is this more of a taste thing or is there something specific you do not like ? Cheers, Jérôme