On Thu, Apr 19, 2018 at 12:32:50PM +0200, Jan Kara wrote: > On Wed 18-04-18 11:54:30, Jerome Glisse wrote: [...] > > I am affraid truely generic write protection for metadata pages is bit > > out of scope of what i am doing. However the mechanism i am proposing > > can be extended for that too. Issue is that all place that want to write > > to those page need to be converted to something where write happens > > between write_begin and write_end section (mmap and CPU pte does give > > this implicitly through page fault, so does write syscall). Basicly > > there is a need to make sure that write and write protection can be > > ordered against one another without complex locking. > > I understand metadata pages are not interesting for your use case. However > from mm point of view these are page cache pages as any other. So maybe my > question should have been: How do we make sure this mechanism will not be > used for pages for which it cannot work? Oh that one is easy, the API take vma + addr or rather mm struct + addr (ie like KSM today kind of). I will change wording in v1 to almost generic write protection :) or process' page write protection (but this would not work for special pfn/vma so not generic their either). > > > > A write protected page has page->mapping pointing to a structure like > > > > struct rmap_item for KSM. So this structure has a list for each unique > > > > combination: > > > > struct write_protect { > > > > struct list_head *mappings; /* write_protect_mapping list */ > > > > ... > > > > }; > > > > > > > > struct write_protect_mapping { > > > > struct list_head list > > > > struct address_space *mapping; > > > > unsigned long offset; > > > > unsigned long private; > > > > ... > > > > }; > > > > > > Auch, the fact that we could share a page as data storage for several > > > inode+offset combinations that are not sharing underlying storage just > > > looks viciously twisted ;) But is it really that useful to warrant > > > complications? In particular I'm afraid that filesystems expect consistency > > > between their internal state (attached to page->private) and page state > > > (e.g. page->flags) and when there are multiple internal states attached to > > > the same page this could go easily wrong... > > > > So at first i want to limit to write protect (not KSM) thus page->flags > > will stay consistent (ie page is only ever associated with a single > > mapping). For KSM yes the page->flags can be problematic, however here > > we can assume that page is clean (and uptodate) and not under write > > back. So problematic flags for KSM: > > - private (page_has_buffers() or PagePrivate (nfs, metadata, ...)) > > - private_2 (FsCache) > > - mappedtodisk > > - swapcache > > - error > > > > Idea again would be to PageFlagsWithMapping(page, mapping) so that for > > non KSM write protected page you test the usual page->flags and for > > write protected page you find the flag value using mapping as lookup > > index. Usualy those flag are seldomly changed/accessed. Again the > > overhead (ignoring code size) would only be for page which are KSM. > > So maybe KSM will not make sense because perf overhead it has with > > page->flags access (i don't think so but i haven't tested this). > > Yeah, sure, page->flags could be dealt with in a similar way but at this > point I don't think it's worth it. And without page->flags I don't think > abstracting page->private makes much sense - or am I missing something why > you need page->private depend on the mapping? So what I wanted to suggest > is that we leave page->private as is currently and just concentrate on > page->mapping hacks... Well i wanted to go up to KSM or at least as close as possible to KSM for file back page. But i can focus on page->mapping first, do write protection with that and also do the per page wait queue for page lock. Which i believe are both nice features. This will also make the patchset smaller and easier to review (less scary). KSM can be done on top of that latter and i will be happy to help. I have a bunch of coccinelle patches for page->private, page->index and i can do some for page->flags. Cheers, Jérôme