On Tue, Jul 02, 2024 at 03:14:54PM -0700, Linus Torvalds wrote: > On Tue, 2 Jul 2024 at 14:15, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > > > I asked you something in the previous e-mail though (with some nastiness > > of the problem pointed out) concerning handling of slow vs fastpath, > > here it is again: > > > > [..]for example did you know xfs does not honor rcu grace periods when > > recycling inodes? > > I don't think it should matter. > > Yes, the vfs layer will access the inode, but then the sequence number > checks should always verify that whatever access we did is then > validated after-the-fact. > [..] > Or do you see something I'm missing? I'm not loving how we deal with > dentry->d_inode, but considering that the whole point of the RCU > lookup is that we don't hold any locks, I think it's almost > unavoidable. > Now I'm confused mate. Based on the convo so far I would expect you would consider the xfs thing a no-go for the machinery. You were rightfully pointing out the relationship dentry<->inode is not stable and care is needed to grab the pointer, and even then the pointer may be wrong by the time one finishes the work. I presume you are also worried about callbacks not taking proper steps when looking at the inode itself -- after all they can be racing with teardown and have to handle it gracefully by returning an error. This much comes with territory. Inode changing identities adds potential trouble which does not need to be there. Certain things remain stable, for example the inode type (regular, fifo etc.). This is where xfs not respecting grace periods comes in and removes it. It may happen to be for stat et al the way they can be implemented at the moment this does not matter, but then it may start being a factor in the future. Say a type-dependent union showed up and needs to be looked at: union { fifo_pointer_t *fifo_data; dev_t i_rdev; ...; }; area pointed to by fifo_data is rcu-protected. Then this: if (IS_FIFO(inode)) { fifo_pointer_t *f = rcu_dereference(inode->fifo_data); if (!f) return -ENODICE; /* use f here */ } while may look safe in terms of taking precatuions and bailing from the fast path happens to be wrong. Suppose the inode got reused and is now representing a device, i_rdev is some funky value. This can race to read that int and trap dereferencing what it found there. This would not happen if the grace period was respected. This is a bug class which does not need to be there and there may be other possible bugs I did not think of. There is also potential trouble with security modules as they unfortunately have a hook for getattr. They presumably can be massaged into handling a possibly dying inode (or failing with -ENODICE), I would not put any stake into handling an inode which becomes something else as they deal with it. As a side note this is where I should also note the *current* LSM hooks are racy as is. Suppose you can stat a particular file now, but a chown to 1234 means you can't. Then your stat call can race against chown + other ops. You can end up in a state where LSMs gave a green light and only then the state changed to one you are not allowed to look at. This would be solvable with per-inode sequence counters, but is arguably beyond the scope of this patch and I'm not interested in working on it. [I read the rest of the mail and have no immediate commentary, we will see after I take a stab at implementing this]