Re: [RFC PATCH v2 5/8] ovl: mark overlayfs' inode dirty on shared writable mmap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 ---- 在 星期五, 2020-11-06 16:50:23 Jan Kara <jack@xxxxxxx> 撰写 ----
 > On Fri 06-11-20 10:41:44, Chengguang Xu wrote:
 > >  ---- 在 星期四, 2020-11-05 23:54:34 Jan Kara <jack@xxxxxxx> 撰写 ----
 > >  > On Thu 05-11-20 16:21:27, Amir Goldstein wrote:
 > >  > > On Thu, Nov 5, 2020 at 4:03 PM Jan Kara <jack@xxxxxxx> wrote:
 > >  > > >
 > >  > > > On Wed 04-11-20 19:54:03, Chengguang Xu wrote:
 > >  > > > >  ---- 在 星期二, 2020-11-03 01:30:52 Jan Kara <jack@xxxxxxx> 撰写 ----
 > >  > > > >  > On Sun 25-10-20 11:41:14, Chengguang Xu wrote:
 > >  > > > >  > > Overlayfs cannot be notified when mmapped area gets dirty,
 > >  > > > >  > > so we need to proactively mark inode dirty in ->mmap operation.
 > >  > > > >  > >
 > >  > > > >  > > Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxxxx>
 > >  > > > >  > > ---
 > >  > > > >  > >  fs/overlayfs/file.c | 4 ++++
 > >  > > > >  > >  1 file changed, 4 insertions(+)
 > >  > > > >  > >
 > >  > > > >  > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
 > >  > > > >  > > index efccb7c1f9bc..cd6fcdfd81a9 100644
 > >  > > > >  > > --- a/fs/overlayfs/file.c
 > >  > > > >  > > +++ b/fs/overlayfs/file.c
 > >  > > > >  > > @@ -486,6 +486,10 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 > >  > > > >  > >          /* Drop reference count from new vm_file value */
 > >  > > > >  > >          fput(realfile);
 > >  > > > >  > >      } else {
 > >  > > > >  > > +        if (vma->vm_flags & (VM_SHARED|VM_MAYSHARE) &&
 > >  > > > >  > > +            vma->vm_flags & (VM_WRITE|VM_MAYWRITE))
 > >  > > > >  > > +            ovl_mark_inode_dirty(file_inode(file));
 > >  > > > >  > > +
 > >  > > > >  >
 > >  > > > >  > But does this work reliably? I mean once writeback runs, your inode (as
 > >  > > > >  > well as upper inode) is cleaned. Then a page fault comes so file has dirty
 > >  > > > >  > pages again and would need flushing but overlayfs inode stays clean? Am I
 > >  > > > >  > missing something?
 > >  > > > >  >
 > >  > > > >
 > >  > > > > Yeah, this is key point of this approach, in order to  fix the issue I
 > >  > > > > explicitly set I_DIRTY_SYNC flag in ovl_mark_inode_dirty(), so what i
 > >  > > > > mean is during writeback we will call into ->write_inode() by this
 > >  > > > > flag(I_DIRTY_SYNC) and at that place we get chance to check mapping and
 > >  > > > > re-dirty overlay's inode. The code logic like below in ovl_write_inode().
 > >  > > > >
 > >  > > > >     if (mapping_writably_mapped(upper->i_mapping) ||
 > >  > > > >          mapping_tagged(upper->i_mapping, PAGECACHE_TAG_WRITEBACK))
 > >  > > > >                  iflag |= I_DIRTY_PAGES;
 > >  > > >
 > >  > > > OK, but suppose the upper mapping is clean at this moment (upper inode has
 > >  > > > been fully written out for whatever reason, but it is still mapped) so your
 > >  > > > overlayfs inode becomes clean as well. Then I don't see a mechanism which
 > >  > > > would make your overlayfs inode dirty again when a write to mmap happens,
 > >  > > > set_page_dirty() will end up marking upper inode with I_DIRTY_PAGES flag.
 > >  > > >
 > >  > > > Note that ovl_mmap() gets called only at mmap(2) syscall time but then
 > >  > > > pages get faulted in, dirtied, cleaned fully at discretion of the mm
 > >  > > > / writeback subsystem.
 > >  > > >
 > >  > > 
 > >  > > Perhaps I will add some background.
 > >  > > 
 > >  > > What I suggested was to maintain a "suspect list" in addition to
 > >  > > the dirty ovl inodes.
 > >  > > 
 > >  > > ovl inode is added to the suspect list on mmap (writable) and removed
 > >  > > from the suspect list on release() flush() or on sync_fs() if real inode is no
 > >  > > longer writably mapped.
 > >  > > 
 > >  > > There was another variant where ovl inode is added to suspect list on open
 > >  > > for write and removed from suspect list on release() flush() or sync_fs()
 > >  > > if real inode is not inode_is_open_for_write().
 > >  > > 
 > >  > > In both cases the list will have inodes whose real is not dirty, but
 > >  > > in both cases
 > >  > > the list shouldn't be terribly large to traverse on sync_fs().
 > >  > > 
 > >  > > Chengguang tried to implement the idea without an actual list by
 > >  > > re-dirtying the "suspect" inodes on every write_inode(), but I personally have
 > >  > > no idea if his idea works.
 > >  > > 
 > >  > > I think we can resort to using an actual suspect list if you say that it
 > >  > > cannot work like this?
 > >  > 
 > >  > Yeah, the suspect list (i.e., additional list of inodes to check on sync)
 > >  > you describe should work fine. 
 > > 
 > > I think this solution still has the problem we have met in below thread[1]
 > > The main problem is the state combination of clean overlayfs' inode && dirty upper inode.
 > 
 > But I think the scheme Amir proposed and I detailed in my previous email
 > should prevent that state. Because while the inode is mapped, it will be
 > kept in the dirty list. So which scenario do you think would lead to clean
 > overlayfs inode and dirty upper inode?

If keeping in the dirty list means making  overlayfs inode dirty, then
I think we don't need extra list for that, vfs itself has writeback list and
the solution will be exactly the same as mine(re-dirty) . Right?


 > 
 > > [1] https://www.spinics.net/lists/linux-unionfs/msg07448.html
 > > 
 > >  > Also the "keep suspect inode dirty" idea
 > >  > of Chengguang could work fine but we'd have to use something like
 > >  > inode_is_open_for_write() or inode_is_writeably_mapped() (which would need
 > >  > to be implemented but it should be easy vma_interval_tree_foreach() walk
 > >  > checking each found VMA for vma->vm_flags & VM_WRITE) for checking whether
 > >  > inode should be redirtied or not.
 > >  > 
 > > 
 > > I'm curious that isn't  it enough to check  i_mmap_writable by
 > > mapping_writably_mapped() ?  Am I missing something?
 > 
 > What is i_mmap_writeable? I've grepped the tree and didn't find anything
 > like that...
 > 

Maybe spelling mistake? The reason I check this is I'm afraid of the permission change of vma by mprotect(2).


Thanks,
Chenguang




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux