Sorry about the delay. I haven't been spending time on this issue because I can just use SMB 3. But for anyone else who is stuck, here's my diagnosis: I also found that the 2nd & 3rd instances of a.out doesn't always need to wait for the 1st one to finish before starting. They consistently start 35.5s after the 1st one. Here are my observations after the 1st program launches and the 2nd & 3rd are prevented from starting: 1. the 1st a.out is in the running state R+ 2. the 2nd a.out still hasn't started. Bash has forked itself to call exec("a.out"), but ps still shows the forked process as Bash, not a.out. The process is in the D+ state, meaning it's inside the kernel. I tried getting the kernel stack trace as Jeff suggested, but "cat /proc/3718/stack" hangs! Eventually when a.out starts 35.5s after the 1st one, I see this: [<ffffffff81113a35>] __alloc_pages_nodemask+0x1a5/0x990 [<ffffffff811b9aaa>] load_elf_binary+0xda/0xeb0 [<ffffffff811375a2>] __vma_link_rb+0x62/0xb0 [<ffffffff81150cb8>] alloc_pages_vma+0x158/0x210 [<ffffffff813623e5>] cpumask_any_but+0x25/0x40 [<ffffffff8104c8a2>] flush_tlb_page+0x32/0x90 [<ffffffff8113c932>] page_add_new_anon_rmap+0x72/0xe0 [<ffffffff8112ffbf>] wp_page_copy+0x31f/0x450 [<ffffffff81159885>] cache_alloc_refill+0x85/0x340 [<ffffffff8115a06f>] kmem_cache_alloc+0x14f/0x1b0 [<ffffffff81070dc0>] prepare_creds+0x20/0xd0 [<ffffffff81167f45>] SyS_faccessat+0x65/0x270 [<ffffffff81045f23>] __do_page_fault+0x253/0x4c0 [<ffffffff817558d7>] system_call_fastpath+0x12/0x6a [<ffffffffffffffff>] 0xffffffffffffffff But that trace is probably irrelevant, because it's now in the running state. Absolutely bizarre. And appalling since it's like you're hurt and can't even yell for help (view the kernel stack) On Thu, Aug 20, 2015 at 5:57 AM, Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote: > On Wed, 19 Aug 2015 04:11:31 -0700 > Yale Zhang <yzhang1985@xxxxxxxxx> wrote: > >> SMB developers/users, >> >> I'm experiencing a strange bottleneck when my files are mounted as SMB >> 2.0. When I launch multiple processes in parallel for benchmarking, >> only the 1st one starts, and the rest won't start until the 1st one >> finishes: >> >> ---------------------------------------test >> programs-------------------------------- >> #!/bin/sh >> ./a.out& >> ./a.out& >> ./a.out& >> wait >> >> a.out is just a C program like this: >> >> int main() >> { >> printf("greetings\n"); >> while (true); >> return 0; >> } >> >> Apparently, this only affects SMB 2.0. I tried it with SMB 2.1, SMB >> 3.0, & SMB 3.02, and everything starts in parallel as expected. >> >> I'm assuming SMB 3 and especially SMB 2.1 would share a common >> implementation. How could 2.0 have the problem but not 3? It almost >> seems the bottleneck is a feature instead of a bug? 8( >> >> Can it still be fixed? >> >> -Yale > > Probably. It'd be interesting to see what the other tasks are blocking > on. After firing up the second one can you run: > > # cat /proc/<pid of second a.out>/stack > > ...and paste the stack trace here? That should tell us what those other > processes are doing. > > -- > Jeff Layton <jlayton@xxxxxxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html