答复: [PATCH] ovl: fsync after metadata copy-up via mount option "upsync=strict"

Lv Fei（吕飞） <feilv@xxxxxxxxxxxx> · Mon, 22 Jul 2024 07:37:25 +0000

> -----邮件原件-----
> 发件人: Amir Goldstein [mailto:amir73il@xxxxxxxxx] 
> 发送时间: 2024年7月20日 15:30
> 收件人: Lv Fei（吕飞） <feilv@xxxxxxxxxxxx>
> 抄送: miklos@xxxxxxxxxx; linux-unionfs@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Xu Lianghu（徐良虎） <lianghuxu@xxxxxxxxxxxx>
> 主题: Re: [PATCH] ovl: fsync after metadata copy-up via mount option "upsync=strict"
> 
> On Fri, Jul 19, 2024 at 12:55 PM Lv Fei（吕飞） <feilv@xxxxxxxxxxxx> wrote:
> >
> >
> > >
> > > -----邮件原件-----
> > > 发件人: Amir Goldstein [mailto:amir73il@xxxxxxxxx]
> > > 发送时间: 2024年7月19日 15:24
> > > 收件人: Lv Fei（吕飞） <feilv@xxxxxxxxxxxx>
> > > 抄送: miklos@xxxxxxxxxx; linux-unionfs@xxxxxxxxxxxxxxx; 
> > > linux-kernel@xxxxxxxxxxxxxxx; Xu Lianghu（徐良虎） > 
> > > <lianghuxu@xxxxxxxxxxxx>
> > > 主题: Re: [PATCH] ovl: fsync after metadata copy-up via mount option "upsync=strict"
> > >
> > > On Thu, Jul 18, 2024 at 6:43 AM Fei Lv <feilv@xxxxxxxxxxxx> wrote:
> > > >
> > > > If a directory only exist in low layer, create a new file in it 
> > > > trigger directory copy-up. Permission lost of the new directory in 
> > > > upper layer was observed during power-cut stress test.
> > >
> > > You should specify that this outcome happens on very specific upper fs (i.e. ubifs) which does not enforce ordering on storing of metadata changes.
> >
> > OK.
> >
> > >
> > > >
> > > > Fix by adding new mount opion "upsync=strict", make sure 
> > > > data/metadata of copied up directory written to disk before 
> > > > renaming from tmp to final destination.
> > > >
> > > > Signed-off-by: Fei Lv <feilv@xxxxxxxxxxxx>
> > > > ---
> > > > OPT_sync changed to OPT_upsync since mount option "sync" already used.
> > >
> > > I see. I don't like the name "upsync" so much, it has other meanings how about using the option "fsync"?
> >
> > OK.
> >
> > >
> > > Here is a suggested documentation (which should be accompanied to 
> > > any patch)
> >
> > OK.
> >
> > >
> > > diff --git a/Documentation/filesystems/overlayfs.rst
> > > b/Documentation/filesystems/overlayfs.rst
> > > index 165514401441..f8183ddf8c4d 100644
> > > --- a/Documentation/filesystems/overlayfs.rst
> > > +++ b/Documentation/filesystems/overlayfs.rst
> > > @@ -742,6 +742,42 @@ controlled by the "uuid" mount option, which supports these values:
> > >      mounted with "uuid=on".
> > >
> > >
> > > +Durability and copy up
> > > +----------------------
> > > +
> > > +The fsync(2) and fdatasync(2) system calls ensure that the metadata 
> > > +and data of a file, respectively, are safely written to the backing 
> > > +storage, which is expected to guarantee the existence of the information post system crash.
> > > +
> > > +Without the fdatasync(2) call, there is no guarantee that the 
> > > +observed data after a system crash will be either the old or the 
> > > +new data, but in practice, the observed data after crash is often the old or new data or a mix of both.
> > > +
> > > +When overlayfs file is modified for the first time, copy up will 
> > > +create a copy of the lower file and its parent directories in the upper layer.
> > > +In case of a system crash, if fdatasync(2) was not called after the 
> > > +modification, the upper file could end up with no data at all (i.e.
> > > +zeros), which would be an unusual outcome.  To avoid this 
> > > +experience, overlayfs calls fsync(2) on the upper file before completing the copy up with rename(2) to make the copy > up "atomic".
> > > +
> > > +Depending on the backing filesystem (e.g. ubifs), fsync(2) before
> > > +rename(2) may not be enough to provide the "atomic" copy up 
> > > +behavior and fsync(2) on the copied up parent directories is required as well.
> > > +
> > > +Overlayfs can be tuned to prefer performance or durability when 
> > > +storing to the underlying upper layer.  This is controlled by the 
> > > +"fsync" mount option, which supports these values:
> > > +
> > > +- "ordered": (default)
> > > +    Call fsync(2) on upper file before completion of copy up.
> > > +- "strict":
> > > +    Call fsync(2) on upper file and directories before completion of copy up.
> > > +- "volatile": [*]
> > > +    Prefer performance over durability (see `Volatile mount`_)
> > > +
> > > +[*] The mount option "volatile" is an alias to "fsync=volatile".
> > > +
> > > +
> > >  Volatile mount
> > >  --------------
> > >
> > > >
> > > >  fs/overlayfs/copy_up.c   | 21 +++++++++++++++++++++
> > > >  fs/overlayfs/ovl_entry.h | 20 ++++++++++++++++++--
> > > >  fs/overlayfs/params.c    | 33 +++++++++++++++++++++++++++++----
> > > >  fs/overlayfs/super.c     |  2 +-
> > > >  4 files changed, 69 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index
> > > > a5ef2005a2cc..b6f021ad7a43 100644
> > > > --- a/fs/overlayfs/copy_up.c
> > > > +++ b/fs/overlayfs/copy_up.c
> > > > @@ -243,6 +243,21 @@ static int ovl_verify_area(loff_t pos, loff_t pos2, loff_t len, loff_t totlen)
> > > >         return 0;
> > > >  }
> > > >
> > > > +static int ovl_copy_up_sync(struct path *path) {
> > > > +       struct file *new_file;
> > > > +       int err;
> > > > +
> > > > +       new_file = ovl_path_open(path, O_LARGEFILE | O_WRONLY);
> > >
> > > I don't think any of those O_ flags are needed for fsync.
> > > Can a directory be opened O_WRONLY???
> >
> > This function may be called with file or directory, shall I need to 
> > use different flags? Such as below:
> >
> > static int ovl_copy_up_sync(struct path *path, bool is_dir) {
> >         struct file *new_file;
> >         int flags;
> >         int err;
> >
> >         flags = is_dir ? (O_RDONLY | O_DIRECTORY) : (O_LARGEFILE | O_WRONLY);
> >         new_file = ovl_path_open(path, flags);
> >         if (IS_ERR(new_file))
> >                 return PTR_ERR(new_file);
> >
> >         err = vfs_fsync(new_file, 0);
>  >         fput(new_file);
> >
> >         return err;
> > }
> >
> 
> You do not need O_WRONLY nor O_LARGEFILE for fsync of a regular file just use O_RDONLY unconditionally.

OK.

> 
> > >
> > > > +       if (IS_ERR(new_file))
> > > > +               return PTR_ERR(new_file);
> > > > +
> > > > +       err = vfs_fsync(new_file, 0);
> > > > +       fput(new_file);
> > > > +
> > > > +       return err;
> > > > +}
> > > > +
> > > >  static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
> > > >                             struct file *new_file, loff_t len)  { 
> > > > @@
> > > > -701,6 +716,9 @@ static int ovl_copy_up_metadata(struct ovl_copy_up_ctx *c, struct dentry *temp)
> > > >                 err = ovl_set_attr(ofs, temp, &c->stat);
> > > >         inode_unlock(temp->d_inode);
> > > >
> > > > +       if (!err && ovl_should_sync_strict(ofs))
> > > > +               err = ovl_copy_up_sync(&upperpath);
> > > > +
> > > >         return err;
> > > >  }
> > > >
> > > > @@ -1104,6 +1122,9 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
> > > >         ovl_clear_flag(OVL_HAS_DIGEST, d_inode(c->dentry));
> > > >         ovl_clear_flag(OVL_VERIFIED_DIGEST, d_inode(c->dentry));
> > > >         ovl_set_upperdata(d_inode(c->dentry));
> > > > +
> > > > +       if (!err && ovl_should_sync_strict(ofs))
> > > > +               err = ovl_copy_up_sync(&upperpath);
> > >
> > > fsync was probably already called in ovl_copy_up_file() making this call redundant and fsync of the removal > of metacopy xattr does not add any safety.
> >
> > My original idea was that ovl_should_sync and ovl_should_sync_strict should not be enabled at the same time.
> 
> You have it wrong.
> 
> The ovl_should_sync() helper does not mean sync_mode==ordered, it means sync_mode!=volatile It literally means "should overlayfs respect fsync"
> and it is used in several places in the code.
> 
> So ovl_should_sync_strict() always implies ovl_should_sync().
> 
> > The reasons are as follows:
> > If bothe ovl_should_sync and ovl_should_sync_strict return ture for 
> > "fsync=strict", and power cut between ovl_copy_up_file and 
> > ovl_copy_up_metadata for file copy-up, seems this file may also lost permission?
> 
> fsync of file in ovl_copy_up_file() the file is either an O_TMPFILE or in the workdir. no risk involved even with ubifs.

Understand, changes not actually updated to destination file at this time.

> 
> >
> > >
> > > >  out_free:
> > > >         kfree(capability);
> > > >  out:
> > > > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h 
> > > > index
> > > > cb449ab310a7..4592e6f7dcf7 100644
> > > > --- a/fs/overlayfs/ovl_entry.h
> > > > +++ b/fs/overlayfs/ovl_entry.h
> > > > @@ -5,6 +5,12 @@
> > > >   * Copyright (C) 2016 Red Hat, Inc.
> > > >   */
> > > >
> > > > +enum {
> > > > +       OVL_SYNC_DATA,
> > > > +       OVL_SYNC_STRICT,
> > > > +       OVL_SYNC_OFF,
> > > > +};
> > > > +
> > > >  struct ovl_config {
> > > >         char *upperdir;
> > > >         char *workdir;
> > > > @@ -18,7 +24,7 @@ struct ovl_config {
> > > >         int xino;
> > > >         bool metacopy;
> > > >         bool userxattr;
> > > > -       bool ovl_volatile;
> > > > +       int sync_mode;
> > > >  };
> > > >
> > > >  struct ovl_sb {
> > > > @@ -120,7 +126,17 @@ static inline struct ovl_fs *OVL_FS(struct 
> > > > super_block *sb)
> > > >
> > > >  static inline bool ovl_should_sync(struct ovl_fs *ofs)  {
> > > > -       return !ofs->config.ovl_volatile;
> > > > +       return ofs->config.sync_mode == OVL_SYNC_DATA;
> > >
> > >     return ofs->config.sync_mode != OVL_SYNC_OFF; or
> > >     return ofs->config.sync_mode != OVL_FSYNC_VOLATILE;
> >
> > There are risks if ovl_should_sync and ovl_should_sync_strict enabled at the same time.
> > The reasons are above.
> >
> 
> No. see above.
> 
> Let me know if I misunderstood your concern.
> 
> Thanks,
> Amir.

I will post patch V2 later.

Thanks,
Fei