---- 在 星期五, 2020-11-06 16:50:23 Jan Kara <jack@xxxxxxx> 撰写 ---- > On Fri 06-11-20 10:41:44, Chengguang Xu wrote: > > ---- 在 星期四, 2020-11-05 23:54:34 Jan Kara <jack@xxxxxxx> 撰写 ---- > > > On Thu 05-11-20 16:21:27, Amir Goldstein wrote: > > > > On Thu, Nov 5, 2020 at 4:03 PM Jan Kara <jack@xxxxxxx> wrote: > > > > > > > > > > On Wed 04-11-20 19:54:03, Chengguang Xu wrote: > > > > > > ---- 在 星期二, 2020-11-03 01:30:52 Jan Kara <jack@xxxxxxx> 撰写 ---- > > > > > > > On Sun 25-10-20 11:41:14, Chengguang Xu wrote: > > > > > > > > Overlayfs cannot be notified when mmapped area gets dirty, > > > > > > > > so we need to proactively mark inode dirty in ->mmap operation. > > > > > > > > > > > > > > > > Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxxxx> > > > > > > > > --- > > > > > > > > fs/overlayfs/file.c | 4 ++++ > > > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > > > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > > > > > > > > index efccb7c1f9bc..cd6fcdfd81a9 100644 > > > > > > > > --- a/fs/overlayfs/file.c > > > > > > > > +++ b/fs/overlayfs/file.c > > > > > > > > @@ -486,6 +486,10 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma) > > > > > > > > /* Drop reference count from new vm_file value */ > > > > > > > > fput(realfile); > > > > > > > > } else { > > > > > > > > + if (vma->vm_flags & (VM_SHARED|VM_MAYSHARE) && > > > > > > > > + vma->vm_flags & (VM_WRITE|VM_MAYWRITE)) > > > > > > > > + ovl_mark_inode_dirty(file_inode(file)); > > > > > > > > + > > > > > > > > > > > > > > But does this work reliably? I mean once writeback runs, your inode (as > > > > > > > well as upper inode) is cleaned. Then a page fault comes so file has dirty > > > > > > > pages again and would need flushing but overlayfs inode stays clean? Am I > > > > > > > missing something? > > > > > > > > > > > > > > > > > > > Yeah, this is key point of this approach, in order to fix the issue I > > > > > > explicitly set I_DIRTY_SYNC flag in ovl_mark_inode_dirty(), so what i > > > > > > mean is during writeback we will call into ->write_inode() by this > > > > > > flag(I_DIRTY_SYNC) and at that place we get chance to check mapping and > > > > > > re-dirty overlay's inode. The code logic like below in ovl_write_inode(). > > > > > > > > > > > > if (mapping_writably_mapped(upper->i_mapping) || > > > > > > mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK)) > > > > > > iflag |= I_DIRTY_PAGES; > > > > > > > > > > OK, but suppose the upper mapping is clean at this moment (upper inode has > > > > > been fully written out for whatever reason, but it is still mapped) so your > > > > > overlayfs inode becomes clean as well. Then I don't see a mechanism which > > > > > would make your overlayfs inode dirty again when a write to mmap happens, > > > > > set_page_dirty() will end up marking upper inode with I_DIRTY_PAGES flag. > > > > > > > > > > Note that ovl_mmap() gets called only at mmap(2) syscall time but then > > > > > pages get faulted in, dirtied, cleaned fully at discretion of the mm > > > > > / writeback subsystem. > > > > > > > > > > > > > Perhaps I will add some background. > > > > > > > > What I suggested was to maintain a "suspect list" in addition to > > > > the dirty ovl inodes. > > > > > > > > ovl inode is added to the suspect list on mmap (writable) and removed > > > > from the suspect list on release() flush() or on sync_fs() if real inode is no > > > > longer writably mapped. > > > > > > > > There was another variant where ovl inode is added to suspect list on open > > > > for write and removed from suspect list on release() flush() or sync_fs() > > > > if real inode is not inode_is_open_for_write(). > > > > > > > > In both cases the list will have inodes whose real is not dirty, but > > > > in both cases > > > > the list shouldn't be terribly large to traverse on sync_fs(). > > > > > > > > Chengguang tried to implement the idea without an actual list by > > > > re-dirtying the "suspect" inodes on every write_inode(), but I personally have > > > > no idea if his idea works. > > > > > > > > I think we can resort to using an actual suspect list if you say that it > > > > cannot work like this? > > > > > > Yeah, the suspect list (i.e., additional list of inodes to check on sync) > > > you describe should work fine. > > > > I think this solution still has the problem we have met in below thread[1] > > The main problem is the state combination of clean overlayfs' inode && dirty upper inode. > > But I think the scheme Amir proposed and I detailed in my previous email > should prevent that state. Because while the inode is mapped, it will be > kept in the dirty list. So which scenario do you think would lead to clean > overlayfs inode and dirty upper inode? If keeping in the dirty list means making overlayfs inode dirty, then I think we don't need extra list for that, vfs itself has writeback list and the solution will be exactly the same as mine(re-dirty) . Right? > > > [1] https://www.spinics.net/lists/linux-unionfs/msg07448.html > > > > > Also the "keep suspect inode dirty" idea > > > of Chengguang could work fine but we'd have to use something like > > > inode_is_open_for_write() or inode_is_writeably_mapped() (which would need > > > to be implemented but it should be easy vma_interval_tree_foreach() walk > > > checking each found VMA for vma->vm_flags & VM_WRITE) for checking whether > > > inode should be redirtied or not. > > > > > > > I'm curious that isn't it enough to check i_mmap_writable by > > mapping_writably_mapped() ? Am I missing something? > > What is i_mmap_writeable? I've grepped the tree and didn't find anything > like that... > Maybe spelling mistake? The reason I check this is I'm afraid of the permission change of vma by mprotect(2). Thanks, Chenguang