Re: [PATCH] nilfs2: support ->tmpfile()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 27, 2024 at 10:29 PM Hongbo Li wrote:
>
>
>
> On 2024/7/20 1:36, Ryusuke Konishi wrote:
> > On Fri, Jul 19, 2024 at 6:12 PM Hongbo Li wrote:
> >>
> >> Add function nilfs2_tmpfile to support O_TMPFILE file creation.
> >>
> >> tmpfile testcases(generic/(004,389,509,530,531) except
> >> generic/389,530 (need acl and shutdown supported) now run/pass.
> >>
> >> Signed-off-by: Hongbo Li <lihongbo22@xxxxxxxxxx>
> >> ---
> >>   fs/nilfs2/namei.c | 31 +++++++++++++++++++++++++++++++
> >>   1 file changed, 31 insertions(+)
> >>
> >> diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c
> >> index c950139db6ef..a36667d7a5e8 100644
> >> --- a/fs/nilfs2/namei.c
> >> +++ b/fs/nilfs2/namei.c
> >> @@ -125,6 +125,36 @@ nilfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
> >>          return err;
> >>   }
> >>
> >> +static int nilfs_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
> >> +                       struct file *file, umode_t mode)
> >> +{
> >> +       struct inode *inode;
> >> +       struct nilfs_transaction_info ti;
> >> +       int err;
> >> +
> >> +       err = nilfs_transaction_begin(dir->i_sb, &ti, 1);
> >> +       if (err)
> >> +               return err;
> >> +
> >> +       inode = nilfs_new_inode(dir, mode);
> >> +       err = PTR_ERR(inode);
> >> +       if (!IS_ERR(inode)) {
> >> +               inode->i_op = &nilfs_file_inode_operations;
> >> +               inode->i_fop = &nilfs_file_operations;
> >> +               inode->i_mapping->a_ops = &nilfs_aops;
> >> +               nilfs_mark_inode_dirty(inode);
> >> +               d_tmpfile(file, inode);
> >> +               unlock_new_inode(inode);
> >> +               err = 0;
> >> +       }
> >> +       if (!err)
> >> +               err = nilfs_transaction_commit(dir->i_sb);
> >> +       else
> >> +               nilfs_transaction_abort(dir->i_sb);
> >> +
> >> +       return finish_open_simple(file, err);
> >> +}
> >> +
> >>   static int nilfs_symlink(struct mnt_idmap *idmap, struct inode *dir,
> >>                           struct dentry *dentry, const char *symname)
> >>   {
> >> @@ -544,6 +574,7 @@ const struct inode_operations nilfs_dir_inode_operations = {
> >>          .mkdir          = nilfs_mkdir,
> >>          .rmdir          = nilfs_rmdir,
> >>          .mknod          = nilfs_mknod,
> >> +       .tmpfile        = nilfs_tmpfile,
> >>          .rename         = nilfs_rename,
> >>          .setattr        = nilfs_setattr,
> >>          .permission     = nilfs_permission,
> >> --
> >> 2.34.1
> >>
> >
> > Hi Hongbo,
> >
> > Thank you for the patch suggestion.
> >
> > As for the O_TMPFILE support, with this implementation, when the file
> > system crashes in an unclean way, the inodes generated in the ifile
> > metadata file by nilfs_new_inode() are not released and remain
> > orphaned.
>
> Doesn't the nilfs transaction ensure this kind of consistency?
>

The nilfs transaction is to gurantee the consistency of metadata
state, but unfortunately it does not guarantee that an inode with link
count 0 will continue to exist.  A different mechanism is needed.

For normal files, when the link count falls to 0 and iput() is
executed, nilfs_evict_inode(), which evicts the inode, simultaneously
releases the inode on the bitmap and the data and b-tree blocks.

A mechanism is needed to allow files with link count == 0 to survive
across checkpoints.

Strictly speaking, there is a problem with the current NILFS2
implementation; if a checkpoint is created between the time the inode
is removed from the namespace and the time the final iput() is called,
and the machine uncleanly goes down at that time,the inode becomes an
orphan inode and its blocks are not released.

Therefore, I think that an additional mechanism to maintain orphan
inodes is needed in any way.  If that can be done, the rest of your
tmpfile implementation seems to be usable almost as is, so how about
holding off until then?

I thought that it would be better for me to implement this mechanism
myself, because it would be difficult to implement correctly without a
thorough understanding of the lifecycle management and coherency of
NILFS2 metadata.

More importantly, since NILFS2 has not kept up with the implementation
of the wide range of functions available in today's file systems, I
would appreciate your help in implementing those additional functions
(for example, functions related to attributes such as encryption and
compression, or other minor features).

> >
> > I think that this problem needs to be solved first.
> >
> > If you could propose a mechanism to repair orphaned inodes at mount
> > time, I would like to apply it.
> > Is that possible?
> >
> > For example,
> >
> > A method of constructing an on-disk chain list of inodes that starts
> > from the latest checkpoint of cpfile, or a reserved inode (inode
> > number 0, etc.) of ifile, registering them there, and releasing them
> > during recovery at mount time.
> >
> > Alternatively, a less efficient method would be to perform a full scan
> > of ifile metadata when recovery occurs at mount time,
> > and release those whose link count does not match the inode bitmap.
>

> Thanks for your detailed explanation. If we scan the orphaned inodes at
> mount time, this may increase the time for mounting (unless scanning in
> background).

A scan only needs to be performed if the file system was not unmounted
cleanly, so it is not necessary to do it every time, but considering
scalability, I now think it would be better to be able to properly
manage orphan inodes, as mentioned above.

Thanks,
Ryusuke Konishi

>
> Thanks,
> Hongbo
>
> >
> > If we actually implement it, I think we need to discuss the method to
> > be determined.
> >
> > This issue takes priority, but I would like to make another comment
> > about the implementation of your proposal:
>
> Thanks for your
> >
> > The call to nilfs_mark_inode_dirty() involves copying the on-memory
> > inode data to the ifile, so it must be done after the on-memory inode
> > update is complete.  Therefore, move it after the call to d_tmpfile().
> > (we need to check if this swap actually works without side effects).
> >
> > Also, the function name in the changelog is a type for "nilfs_tmpfile".
> >
> > That's all for now.
> >
> > Thanks,
> > Ryusuke Konishi


Ryusuke Konishi





[Index of Archives]     [Linux Filesystem Development]     [Linux BTRFS]     [Linux CIFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux