On Tue, Sep 08, 2020 at 07:27:58AM -0400, Jeff Layton wrote: > On Mon, 2020-09-07 at 20:38 -0700, Eric Biggers wrote: > > On Fri, Sep 04, 2020 at 12:05:20PM -0400, Jeff Layton wrote: > > > Ceph needs to be able to allocate inodes ahead of a create that might > > > involve a fscrypt-encrypted inode. new_inode() almost fits the bill, > > > but it puts the inode on the sb->s_inodes list, and we don't want to > > > do that until we're ready to insert it into the hash. > > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > --- > > > fs/inode.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/fs/inode.c b/fs/inode.c > > > index 72c4c347afb7..61554c2477ab 100644 > > > --- a/fs/inode.c > > > +++ b/fs/inode.c > > > @@ -935,6 +935,7 @@ struct inode *new_inode_pseudo(struct super_block *sb) > > > } > > > return inode; > > > } > > > +EXPORT_SYMBOL(new_inode_pseudo); > > > > > > > What's the problem with putting the new inode on sb->s_inodes already? > > That's what all the other filesystems do. > > > > The existing ones are all local filesystems that use > insert_inode_locked() and similar paths. Ceph needs to use the '5' > variants of those functions (inode_insert5(), iget5_locked(), etc.). > > When we go to insert it into the hash in inode_insert5(), we'd need to > set I_CREATING if allocated from new_inode(). But, if you do _that_, > then you'll get back ESTALE from find_inode() if (eg.) someone calls > iget5_locked() before you can clear I_CREATING. > > Hitting that race is easy with an asynchronous create. The simplest > scheme to avoid that is to just export new_inode_pseudo and keep it off > of s_inodes until we're ready to do the insert. The only real issue here > is that this inode won't be findable by evict_inodes during umount, but > that shouldn't be happening during an active syscall anyway. Is your concern the following scenario? 1. ceph successfully created a new file on the server 2. inode_insert5() is called for the new file's inode 3. error occurs in ceph_fill_inode() 4. discard_new_inode() is called 5. another thread looks up the inode and gets ESTALE 6. iput() is finally called And the claim is that the ESTALE in (5) is unexpected? I'm not sure that it's unexpected, given that the file allegedly failed to be created... Also, it seems that maybe (3) isn't something that should actually happen, since after (1) it's too late to fail the file creation. - Eric