On Wed, 2020-09-09 at 11:49 -0700, Eric Biggers wrote: > [+Cc Al] > > On Wed, Sep 09, 2020 at 12:51:02PM -0400, Jeff Layton wrote: > > > > No, more like: > > > > > > > > Syscall Workqueue > > > > ------------------------------------------------------------------------------ > > > > 1. allocate an inode > > > > 2. determine we can do an async create > > > > and allocate an inode number for it > > > > 3. hash the inode (must set I_CREATING > > > > if we allocated with new_inode()) > > > > 4. issue the call to the MDS > > > > 5. finish filling out the inode() > > > > 6. MDS reply comes in, and workqueue thread > > > > looks up new inode (-ESTALE) > > > > 7. unlock_new_inode() > > > > > > > > > > > > Because 6 happens before 7 in this case, we get an ESTALE on that > > > > lookup. > > > > > > How is ESTALE at (6) possible? (3) will set I_NEW on the inode when inserting > > > it into the inode hash table. Then (6) will wait for I_NEW to be cleared before > > > returning the inode. (7) will clear I_NEW and I_CREATING together. > > > > > > > Long call chain, but: > > > > ceph_fill_trace > > ceph_get_inode > > iget5_locked > > ilookup5(..._nowait, etc) > > find_inode_fast > > > > > > ...and find_inode_fast does: > > > > if (unlikely(inode->i_state & I_CREATING)) { > > spin_unlock(&inode->i_lock); > > return ERR_PTR(-ESTALE); > > } > > Why does ilookup5() not wait for I_NEW to be cleared if the inode has > I_NEW|I_CREATING, but it does wait for I_NEW to be cleared if I_NEW is set its > own? That seems like a bug. > Funny, I asked Al the same thing on IRC the other day: 23:28 < jlayton> viro: wondering if there is a potential race with I_CREATING in find_inode. Seems like you could have 2 tasks racing in calls to iget5_locked for the same inode. One creates an inode and starts instantiating it, and the second one gets NULL back because I_CREATING is set. 23:30 < viro> jlayton: where would I_CREATING come from? 23:30 < viro> it's set on insert_inode_locked() and similar paths 23:31 < viro> where you want iget5_locked() to fuck off and eat ESTALE 23:31 < jlayton> ok, right -- I was trying to pass an inode from new_inode to inode_insert5 23:32 < viro> seeing that it's been asked for an inode number that did _not_ exist until just now (we'd just allocated it) The assumption is that we'll never go looking for an inode until after I_NEW is cleared. In the case of an asynchronous create in ceph though, we may do exactly that if the reply comes back very quickly. -- Jeff Layton <jlayton@xxxxxxxxxx>