On Mon, 2020-09-07 at 20:38 -0700, Eric Biggers wrote: > On Fri, Sep 04, 2020 at 12:05:20PM -0400, Jeff Layton wrote: > > Ceph needs to be able to allocate inodes ahead of a create that might > > involve a fscrypt-encrypted inode. new_inode() almost fits the bill, > > but it puts the inode on the sb->s_inodes list, and we don't want to > > do that until we're ready to insert it into the hash. > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > --- > > fs/inode.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/fs/inode.c b/fs/inode.c > > index 72c4c347afb7..61554c2477ab 100644 > > --- a/fs/inode.c > > +++ b/fs/inode.c > > @@ -935,6 +935,7 @@ struct inode *new_inode_pseudo(struct super_block *sb) > > } > > return inode; > > } > > +EXPORT_SYMBOL(new_inode_pseudo); > > > > What's the problem with putting the new inode on sb->s_inodes already? > That's what all the other filesystems do. > The existing ones are all local filesystems that use insert_inode_locked() and similar paths. Ceph needs to use the '5' variants of those functions (inode_insert5(), iget5_locked(), etc.). When we go to insert it into the hash in inode_insert5(), we'd need to set I_CREATING if allocated from new_inode(). But, if you do _that_, then you'll get back ESTALE from find_inode() if (eg.) someone calls iget5_locked() before you can clear I_CREATING. Hitting that race is easy with an asynchronous create. The simplest scheme to avoid that is to just export new_inode_pseudo and keep it off of s_inodes until we're ready to do the insert. The only real issue here is that this inode won't be findable by evict_inodes during umount, but that shouldn't be happening during an active syscall anyway. -- Jeff Layton <jlayton@xxxxxxxxxx>