On Thu, Apr 16, 2015 at 07:09:32PM +0100, Al Viro wrote: > On Thu, Apr 16, 2015 at 02:16:31PM +0200, Mateusz Guzik wrote: > > @@ -165,8 +165,10 @@ static int expand_fdtable(struct files_struct *files, int nr) > > cur_fdt = files_fdtable(files); > > if (nr >= cur_fdt->max_fds) { > > /* Continue as planned */ > > + write_seqcount_begin(&files->fdt_seqcount); > > copy_fdtable(new_fdt, cur_fdt); > > rcu_assign_pointer(files->fdt, new_fdt); > > + write_seqcount_end(&files->fdt_seqcount); > > if (cur_fdt != &files->fdtab) > > call_rcu(&cur_fdt->rcu, free_fdtable_rcu); > > Interesting. AFAICS, your test doesn't step anywhere near that path, > does it? So basically you never hit the retries during that... well, yeah. In fact for non-shared tables one could go a step further and just plop the pointer in, but I don't know if that makes much sense. Other processes inspecting the table could get away with a data dependency barrier. Closing would still take the lock, so you can only suddenly see filp installed, but never one going away. Now, as far as correctness goes, I think there is a bug in the patch (which does not invalidate the idea though). Chances are I got a fix as well. Benchmark prog is here: http://people.redhat.com/~mguzik/pipebench.c A modified version: http://people.redhat.com/~mguzik/fdi-fail.c Benchmark is just doing pipe + close in a loop in multiple threads. Modified version spawns threads, sleeps 100 ms and does dup(0, 300) to reallocate the table while other threads continue the work. This succesfully tested retries (along with cases where installed file got copied and was encountered during retry). However, I see sporadic close failures. I presume this is because of a missing read barrier after write_seqcount_begin. Adding a smp_mb() seems to solve the problem, but I could only test on 2 * 16 Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz. My memory barrier-fu is rather weak and I'm not that confident in my crap suspicion here. Thoughts? -- Mateusz Guzik -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html