On Mon, 2015-04-20 at 15:41 +0200, Mateusz Guzik wrote: > On Sat, Apr 18, 2015 at 12:41:38PM -0700, Eric Dumazet wrote: > > On Sat, 2015-04-18 at 00:02 +0100, Al Viro wrote: > > > On Sat, Apr 18, 2015 at 12:16:48AM +0200, Mateusz Guzik wrote: > > > > > > > I would say this makes the use of seq counter impossible. Even if we > > > > decided to fall back to a lock on retry, we cannot know what to do if > > > > the slot is reserved - it very well could be that something called > > > > close, and something else reserved the slot, so putting the file inside > > > > could be really bad. In fact we would be putting a file for which we > > > > don't have a reference anymore. > > > > > > > > However, not all hope is lost and I still think we can speed things up. > > > > > > > > A locking primitive which only locks stuff for current cpu and has > > > > another mode where it locks stuff for all cpus would do the trick just > > > > fine. I'm not a linux guy, quick search suggests 'lglock' would do what > > > > I want. > > > > > > > > table reallocation is an extremely rare operation, so this should be > > > > fine. It would take the lock 'globally' for given table. > > > > > > It would also mean percpu_alloc() for each descriptor table... > > > > I would rather use an xchg() instead of rcu_assign_ponter() > > > > old = xchg(&fdt->fd[fd], file); > > if (unlikely(old)) > > filp_close(old, files); > > > > If threads are using close() on random fds, final result is not > > guaranteed anyway. > > > > Well I don't see how could this be used to fix the problem. > > If you are retrying and see NULL, you don't know whether your previous > update was not picked up by memcpy OR the fd got closed, which also > unreferenced the file you are installing. But you can't tell what > happened. > > If you see non-NULL and what you found is not the file you are > installing, you know the file was freed so you can't close the old file. > > One could try to introduce an invariant that files installed in a > lockless manner have to start with refcount 1, you still can't infer > anything from the fact that the counter is 1 when you retry (even if you > take the lock). It could have been duped, or even sent over a unix > socket and closed (although that awould surely require a solid pause in > execution) and who knows what else. > > In general I would say this approach is too hard to get right to be > worthwile given expected speedup. > Hey, that's because I really meant (during the week end) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html