On Tue, Aug 27, 2024 at 10:29 PM Hongbo Li wrote: > > > > On 2024/7/20 1:36, Ryusuke Konishi wrote: > > On Fri, Jul 19, 2024 at 6:12 PM Hongbo Li wrote: > >> > >> Add function nilfs2_tmpfile to support O_TMPFILE file creation. > >> > >> tmpfile testcases(generic/(004,389,509,530,531) except > >> generic/389,530 (need acl and shutdown supported) now run/pass. > >> > >> Signed-off-by: Hongbo Li <lihongbo22@xxxxxxxxxx> > >> --- > >> fs/nilfs2/namei.c | 31 +++++++++++++++++++++++++++++++ > >> 1 file changed, 31 insertions(+) > >> > >> diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c > >> index c950139db6ef..a36667d7a5e8 100644 > >> --- a/fs/nilfs2/namei.c > >> +++ b/fs/nilfs2/namei.c > >> @@ -125,6 +125,36 @@ nilfs_mknod(struct mnt_idmap *idmap, struct inode *dir, > >> return err; > >> } > >> > >> +static int nilfs_tmpfile(struct mnt_idmap *idmap, struct inode *dir, > >> + struct file *file, umode_t mode) > >> +{ > >> + struct inode *inode; > >> + struct nilfs_transaction_info ti; > >> + int err; > >> + > >> + err = nilfs_transaction_begin(dir->i_sb, &ti, 1); > >> + if (err) > >> + return err; > >> + > >> + inode = nilfs_new_inode(dir, mode); > >> + err = PTR_ERR(inode); > >> + if (!IS_ERR(inode)) { > >> + inode->i_op = &nilfs_file_inode_operations; > >> + inode->i_fop = &nilfs_file_operations; > >> + inode->i_mapping->a_ops = &nilfs_aops; > >> + nilfs_mark_inode_dirty(inode); > >> + d_tmpfile(file, inode); > >> + unlock_new_inode(inode); > >> + err = 0; > >> + } > >> + if (!err) > >> + err = nilfs_transaction_commit(dir->i_sb); > >> + else > >> + nilfs_transaction_abort(dir->i_sb); > >> + > >> + return finish_open_simple(file, err); > >> +} > >> + > >> static int nilfs_symlink(struct mnt_idmap *idmap, struct inode *dir, > >> struct dentry *dentry, const char *symname) > >> { > >> @@ -544,6 +574,7 @@ const struct inode_operations nilfs_dir_inode_operations = { > >> .mkdir = nilfs_mkdir, > >> .rmdir = nilfs_rmdir, > >> .mknod = nilfs_mknod, > >> + .tmpfile = nilfs_tmpfile, > >> .rename = nilfs_rename, > >> .setattr = nilfs_setattr, > >> .permission = nilfs_permission, > >> -- > >> 2.34.1 > >> > > > > Hi Hongbo, > > > > Thank you for the patch suggestion. > > > > As for the O_TMPFILE support, with this implementation, when the file > > system crashes in an unclean way, the inodes generated in the ifile > > metadata file by nilfs_new_inode() are not released and remain > > orphaned. > > Doesn't the nilfs transaction ensure this kind of consistency? > The nilfs transaction is to gurantee the consistency of metadata state, but unfortunately it does not guarantee that an inode with link count 0 will continue to exist. A different mechanism is needed. For normal files, when the link count falls to 0 and iput() is executed, nilfs_evict_inode(), which evicts the inode, simultaneously releases the inode on the bitmap and the data and b-tree blocks. A mechanism is needed to allow files with link count == 0 to survive across checkpoints. Strictly speaking, there is a problem with the current NILFS2 implementation; if a checkpoint is created between the time the inode is removed from the namespace and the time the final iput() is called, and the machine uncleanly goes down at that time,the inode becomes an orphan inode and its blocks are not released. Therefore, I think that an additional mechanism to maintain orphan inodes is needed in any way. If that can be done, the rest of your tmpfile implementation seems to be usable almost as is, so how about holding off until then? I thought that it would be better for me to implement this mechanism myself, because it would be difficult to implement correctly without a thorough understanding of the lifecycle management and coherency of NILFS2 metadata. More importantly, since NILFS2 has not kept up with the implementation of the wide range of functions available in today's file systems, I would appreciate your help in implementing those additional functions (for example, functions related to attributes such as encryption and compression, or other minor features). > > > > I think that this problem needs to be solved first. > > > > If you could propose a mechanism to repair orphaned inodes at mount > > time, I would like to apply it. > > Is that possible? > > > > For example, > > > > A method of constructing an on-disk chain list of inodes that starts > > from the latest checkpoint of cpfile, or a reserved inode (inode > > number 0, etc.) of ifile, registering them there, and releasing them > > during recovery at mount time. > > > > Alternatively, a less efficient method would be to perform a full scan > > of ifile metadata when recovery occurs at mount time, > > and release those whose link count does not match the inode bitmap. > > Thanks for your detailed explanation. If we scan the orphaned inodes at > mount time, this may increase the time for mounting (unless scanning in > background). A scan only needs to be performed if the file system was not unmounted cleanly, so it is not necessary to do it every time, but considering scalability, I now think it would be better to be able to properly manage orphan inodes, as mentioned above. Thanks, Ryusuke Konishi > > Thanks, > Hongbo > > > > > If we actually implement it, I think we need to discuss the method to > > be determined. > > > > This issue takes priority, but I would like to make another comment > > about the implementation of your proposal: > > Thanks for your > > > > The call to nilfs_mark_inode_dirty() involves copying the on-memory > > inode data to the ifile, so it must be done after the on-memory inode > > update is complete. Therefore, move it after the call to d_tmpfile(). > > (we need to check if this swap actually works without side effects). > > > > Also, the function name in the changelog is a type for "nilfs_tmpfile". > > > > That's all for now. > > > > Thanks, > > Ryusuke Konishi Ryusuke Konishi