---- 在 星期四, 2020-11-05 23:54:34 Jan Kara <jack@xxxxxxx> 撰写 ---- > On Thu 05-11-20 16:21:27, Amir Goldstein wrote: > > On Thu, Nov 5, 2020 at 4:03 PM Jan Kara <jack@xxxxxxx> wrote: > > > > > > On Wed 04-11-20 19:54:03, Chengguang Xu wrote: > > > > ---- 在 星期二, 2020-11-03 01:30:52 Jan Kara <jack@xxxxxxx> 撰写 ---- > > > > > On Sun 25-10-20 11:41:14, Chengguang Xu wrote: > > > > > > Overlayfs cannot be notified when mmapped area gets dirty, > > > > > > so we need to proactively mark inode dirty in ->mmap operation. > > > > > > > > > > > > Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxxxx> > > > > > > --- > > > > > > fs/overlayfs/file.c | 4 ++++ > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > > > > > > index efccb7c1f9bc..cd6fcdfd81a9 100644 > > > > > > --- a/fs/overlayfs/file.c > > > > > > +++ b/fs/overlayfs/file.c > > > > > > @@ -486,6 +486,10 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma) > > > > > > /* Drop reference count from new vm_file value */ > > > > > > fput(realfile); > > > > > > } else { > > > > > > + if (vma->vm_flags & (VM_SHARED|VM_MAYSHARE) && > > > > > > + vma->vm_flags & (VM_WRITE|VM_MAYWRITE)) > > > > > > + ovl_mark_inode_dirty(file_inode(file)); > > > > > > + > > > > > > > > > > But does this work reliably? I mean once writeback runs, your inode (as > > > > > well as upper inode) is cleaned. Then a page fault comes so file has dirty > > > > > pages again and would need flushing but overlayfs inode stays clean? Am I > > > > > missing something? > > > > > > > > > > > > > Yeah, this is key point of this approach, in order to fix the issue I > > > > explicitly set I_DIRTY_SYNC flag in ovl_mark_inode_dirty(), so what i > > > > mean is during writeback we will call into ->write_inode() by this > > > > flag(I_DIRTY_SYNC) and at that place we get chance to check mapping and > > > > re-dirty overlay's inode. The code logic like below in ovl_write_inode(). > > > > > > > > if (mapping_writably_mapped(upper->i_mapping) || > > > > mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK)) > > > > iflag |= I_DIRTY_PAGES; > > > > > > OK, but suppose the upper mapping is clean at this moment (upper inode has > > > been fully written out for whatever reason, but it is still mapped) so your > > > overlayfs inode becomes clean as well. Then I don't see a mechanism which > > > would make your overlayfs inode dirty again when a write to mmap happens, > > > set_page_dirty() will end up marking upper inode with I_DIRTY_PAGES flag. > > > > > > Note that ovl_mmap() gets called only at mmap(2) syscall time but then > > > pages get faulted in, dirtied, cleaned fully at discretion of the mm > > > / writeback subsystem. > > > > > > > Perhaps I will add some background. > > > > What I suggested was to maintain a "suspect list" in addition to > > the dirty ovl inodes. > > > > ovl inode is added to the suspect list on mmap (writable) and removed > > from the suspect list on release() flush() or on sync_fs() if real inode is no > > longer writably mapped. > > > > There was another variant where ovl inode is added to suspect list on open > > for write and removed from suspect list on release() flush() or sync_fs() > > if real inode is not inode_is_open_for_write(). > > > > In both cases the list will have inodes whose real is not dirty, but > > in both cases > > the list shouldn't be terribly large to traverse on sync_fs(). > > > > Chengguang tried to implement the idea without an actual list by > > re-dirtying the "suspect" inodes on every write_inode(), but I personally have > > no idea if his idea works. > > > > I think we can resort to using an actual suspect list if you say that it > > cannot work like this? > > Yeah, the suspect list (i.e., additional list of inodes to check on sync) > you describe should work fine. I think this solution still has the problem we have met in below thread[1] The main problem is the state combination of clean overlayfs' inode && dirty upper inode. [1] https://www.spinics.net/lists/linux-unionfs/msg07448.html > Also the "keep suspect inode dirty" idea > of Chengguang could work fine but we'd have to use something like > inode_is_open_for_write() or inode_is_writeably_mapped() (which would need > to be implemented but it should be easy vma_interval_tree_foreach() walk > checking each found VMA for vma->vm_flags & VM_WRITE) for checking whether > inode should be redirtied or not. > I'm curious that isn't it enough to check i_mmap_writable by mapping_writably_mapped() ? Am I missing something? Thanks, Chengguang