On Sat, Apr 18, 2015 at 12:41:38PM -0700, Eric Dumazet wrote: > On Sat, 2015-04-18 at 00:02 +0100, Al Viro wrote: > > On Sat, Apr 18, 2015 at 12:16:48AM +0200, Mateusz Guzik wrote: > > > > > I would say this makes the use of seq counter impossible. Even if we > > > decided to fall back to a lock on retry, we cannot know what to do if > > > the slot is reserved - it very well could be that something called > > > close, and something else reserved the slot, so putting the file inside > > > could be really bad. In fact we would be putting a file for which we > > > don't have a reference anymore. > > > > > > However, not all hope is lost and I still think we can speed things up. > > > > > > A locking primitive which only locks stuff for current cpu and has > > > another mode where it locks stuff for all cpus would do the trick just > > > fine. I'm not a linux guy, quick search suggests 'lglock' would do what > > > I want. > > > > > > table reallocation is an extremely rare operation, so this should be > > > fine. It would take the lock 'globally' for given table. > > > > It would also mean percpu_alloc() for each descriptor table... > > I would rather use an xchg() instead of rcu_assign_ponter() > > old = xchg(&fdt->fd[fd], file); > if (unlikely(old)) > filp_close(old, files); > > If threads are using close() on random fds, final result is not > guaranteed anyway. > Well I don't see how could this be used to fix the problem. If you are retrying and see NULL, you don't know whether your previous update was not picked up by memcpy OR the fd got closed, which also unreferenced the file you are installing. But you can't tell what happened. If you see non-NULL and what you found is not the file you are installing, you know the file was freed so you can't close the old file. One could try to introduce an invariant that files installed in a lockless manner have to start with refcount 1, you still can't infer anything from the fact that the counter is 1 when you retry (even if you take the lock). It could have been duped, or even sent over a unix socket and closed (although that awould surely require a solid pause in execution) and who knows what else. In general I would say this approach is too hard to get right to be worthwile given expected speedup. -- Mateusz Guzik -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html