> -----邮件原件----- > 发件人: Amir Goldstein [mailto:amir73il@xxxxxxxxx] > 发送时间: 2024年7月20日 15:30 > 收件人: Lv Fei(吕飞) <feilv@xxxxxxxxxxxx> > 抄送: miklos@xxxxxxxxxx; linux-unionfs@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Xu Lianghu(徐良虎) <lianghuxu@xxxxxxxxxxxx> > 主题: Re: [PATCH] ovl: fsync after metadata copy-up via mount option "upsync=strict" > > On Fri, Jul 19, 2024 at 12:55 PM Lv Fei(吕飞) <feilv@xxxxxxxxxxxx> wrote: > > > > > > > > > > -----邮件原件----- > > > 发件人: Amir Goldstein [mailto:amir73il@xxxxxxxxx] > > > 发送时间: 2024年7月19日 15:24 > > > 收件人: Lv Fei(吕飞) <feilv@xxxxxxxxxxxx> > > > 抄送: miklos@xxxxxxxxxx; linux-unionfs@xxxxxxxxxxxxxxx; > > > linux-kernel@xxxxxxxxxxxxxxx; Xu Lianghu(徐良虎) > > > > <lianghuxu@xxxxxxxxxxxx> > > > 主题: Re: [PATCH] ovl: fsync after metadata copy-up via mount option "upsync=strict" > > > > > > On Thu, Jul 18, 2024 at 6:43 AM Fei Lv <feilv@xxxxxxxxxxxx> wrote: > > > > > > > > If a directory only exist in low layer, create a new file in it > > > > trigger directory copy-up. Permission lost of the new directory in > > > > upper layer was observed during power-cut stress test. > > > > > > You should specify that this outcome happens on very specific upper fs (i.e. ubifs) which does not enforce ordering on storing of metadata changes. > > > > OK. > > > > > > > > > > > > > Fix by adding new mount opion "upsync=strict", make sure > > > > data/metadata of copied up directory written to disk before > > > > renaming from tmp to final destination. > > > > > > > > Signed-off-by: Fei Lv <feilv@xxxxxxxxxxxx> > > > > --- > > > > OPT_sync changed to OPT_upsync since mount option "sync" already used. > > > > > > I see. I don't like the name "upsync" so much, it has other meanings how about using the option "fsync"? > > > > OK. > > > > > > > > Here is a suggested documentation (which should be accompanied to > > > any patch) > > > > OK. > > > > > > > > diff --git a/Documentation/filesystems/overlayfs.rst > > > b/Documentation/filesystems/overlayfs.rst > > > index 165514401441..f8183ddf8c4d 100644 > > > --- a/Documentation/filesystems/overlayfs.rst > > > +++ b/Documentation/filesystems/overlayfs.rst > > > @@ -742,6 +742,42 @@ controlled by the "uuid" mount option, which supports these values: > > > mounted with "uuid=on". > > > > > > > > > +Durability and copy up > > > +---------------------- > > > + > > > +The fsync(2) and fdatasync(2) system calls ensure that the metadata > > > +and data of a file, respectively, are safely written to the backing > > > +storage, which is expected to guarantee the existence of the information post system crash. > > > + > > > +Without the fdatasync(2) call, there is no guarantee that the > > > +observed data after a system crash will be either the old or the > > > +new data, but in practice, the observed data after crash is often the old or new data or a mix of both. > > > + > > > +When overlayfs file is modified for the first time, copy up will > > > +create a copy of the lower file and its parent directories in the upper layer. > > > +In case of a system crash, if fdatasync(2) was not called after the > > > +modification, the upper file could end up with no data at all (i.e. > > > +zeros), which would be an unusual outcome. To avoid this > > > +experience, overlayfs calls fsync(2) on the upper file before completing the copy up with rename(2) to make the copy > up "atomic". > > > + > > > +Depending on the backing filesystem (e.g. ubifs), fsync(2) before > > > +rename(2) may not be enough to provide the "atomic" copy up > > > +behavior and fsync(2) on the copied up parent directories is required as well. > > > + > > > +Overlayfs can be tuned to prefer performance or durability when > > > +storing to the underlying upper layer. This is controlled by the > > > +"fsync" mount option, which supports these values: > > > + > > > +- "ordered": (default) > > > + Call fsync(2) on upper file before completion of copy up. > > > +- "strict": > > > + Call fsync(2) on upper file and directories before completion of copy up. > > > +- "volatile": [*] > > > + Prefer performance over durability (see `Volatile mount`_) > > > + > > > +[*] The mount option "volatile" is an alias to "fsync=volatile". > > > + > > > + > > > Volatile mount > > > -------------- > > > > > > > > > > > fs/overlayfs/copy_up.c | 21 +++++++++++++++++++++ > > > > fs/overlayfs/ovl_entry.h | 20 ++++++++++++++++++-- > > > > fs/overlayfs/params.c | 33 +++++++++++++++++++++++++++++---- > > > > fs/overlayfs/super.c | 2 +- > > > > 4 files changed, 69 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index > > > > a5ef2005a2cc..b6f021ad7a43 100644 > > > > --- a/fs/overlayfs/copy_up.c > > > > +++ b/fs/overlayfs/copy_up.c > > > > @@ -243,6 +243,21 @@ static int ovl_verify_area(loff_t pos, loff_t pos2, loff_t len, loff_t totlen) > > > > return 0; > > > > } > > > > > > > > +static int ovl_copy_up_sync(struct path *path) { > > > > + struct file *new_file; > > > > + int err; > > > > + > > > > + new_file = ovl_path_open(path, O_LARGEFILE | O_WRONLY); > > > > > > I don't think any of those O_ flags are needed for fsync. > > > Can a directory be opened O_WRONLY??? > > > > This function may be called with file or directory, shall I need to > > use different flags? Such as below: > > > > static int ovl_copy_up_sync(struct path *path, bool is_dir) { > > struct file *new_file; > > int flags; > > int err; > > > > flags = is_dir ? (O_RDONLY | O_DIRECTORY) : (O_LARGEFILE | O_WRONLY); > > new_file = ovl_path_open(path, flags); > > if (IS_ERR(new_file)) > > return PTR_ERR(new_file); > > > > err = vfs_fsync(new_file, 0); > > fput(new_file); > > > > return err; > > } > > > > You do not need O_WRONLY nor O_LARGEFILE for fsync of a regular file just use O_RDONLY unconditionally. OK. > > > > > > > > + if (IS_ERR(new_file)) > > > > + return PTR_ERR(new_file); > > > > + > > > > + err = vfs_fsync(new_file, 0); > > > > + fput(new_file); > > > > + > > > > + return err; > > > > +} > > > > + > > > > static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry, > > > > struct file *new_file, loff_t len) { > > > > @@ > > > > -701,6 +716,9 @@ static int ovl_copy_up_metadata(struct ovl_copy_up_ctx *c, struct dentry *temp) > > > > err = ovl_set_attr(ofs, temp, &c->stat); > > > > inode_unlock(temp->d_inode); > > > > > > > > + if (!err && ovl_should_sync_strict(ofs)) > > > > + err = ovl_copy_up_sync(&upperpath); > > > > + > > > > return err; > > > > } > > > > > > > > @@ -1104,6 +1122,9 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c) > > > > ovl_clear_flag(OVL_HAS_DIGEST, d_inode(c->dentry)); > > > > ovl_clear_flag(OVL_VERIFIED_DIGEST, d_inode(c->dentry)); > > > > ovl_set_upperdata(d_inode(c->dentry)); > > > > + > > > > + if (!err && ovl_should_sync_strict(ofs)) > > > > + err = ovl_copy_up_sync(&upperpath); > > > > > > fsync was probably already called in ovl_copy_up_file() making this call redundant and fsync of the removal > of metacopy xattr does not add any safety. > > > > My original idea was that ovl_should_sync and ovl_should_sync_strict should not be enabled at the same time. > > You have it wrong. > > The ovl_should_sync() helper does not mean sync_mode==ordered, it means sync_mode!=volatile It literally means "should overlayfs respect fsync" > and it is used in several places in the code. > > So ovl_should_sync_strict() always implies ovl_should_sync(). > > > The reasons are as follows: > > If bothe ovl_should_sync and ovl_should_sync_strict return ture for > > "fsync=strict", and power cut between ovl_copy_up_file and > > ovl_copy_up_metadata for file copy-up, seems this file may also lost permission? > > fsync of file in ovl_copy_up_file() the file is either an O_TMPFILE or in the workdir. no risk involved even with ubifs. Understand, changes not actually updated to destination file at this time. > > > > > > > > > > out_free: > > > > kfree(capability); > > > > out: > > > > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h > > > > index > > > > cb449ab310a7..4592e6f7dcf7 100644 > > > > --- a/fs/overlayfs/ovl_entry.h > > > > +++ b/fs/overlayfs/ovl_entry.h > > > > @@ -5,6 +5,12 @@ > > > > * Copyright (C) 2016 Red Hat, Inc. > > > > */ > > > > > > > > +enum { > > > > + OVL_SYNC_DATA, > > > > + OVL_SYNC_STRICT, > > > > + OVL_SYNC_OFF, > > > > +}; > > > > + > > > > struct ovl_config { > > > > char *upperdir; > > > > char *workdir; > > > > @@ -18,7 +24,7 @@ struct ovl_config { > > > > int xino; > > > > bool metacopy; > > > > bool userxattr; > > > > - bool ovl_volatile; > > > > + int sync_mode; > > > > }; > > > > > > > > struct ovl_sb { > > > > @@ -120,7 +126,17 @@ static inline struct ovl_fs *OVL_FS(struct > > > > super_block *sb) > > > > > > > > static inline bool ovl_should_sync(struct ovl_fs *ofs) { > > > > - return !ofs->config.ovl_volatile; > > > > + return ofs->config.sync_mode == OVL_SYNC_DATA; > > > > > > return ofs->config.sync_mode != OVL_SYNC_OFF; or > > > return ofs->config.sync_mode != OVL_FSYNC_VOLATILE; > > > > There are risks if ovl_should_sync and ovl_should_sync_strict enabled at the same time. > > The reasons are above. > > > > No. see above. > > Let me know if I misunderstood your concern. > > Thanks, > Amir. I will post patch V2 later. Thanks, Fei