On Dec 29, 2011, at 11:12 AM, Stanislav Kinsbursky wrote: > 29.12.2011 20:03, Chuck Lever пишет: >>> Because unix named (!) sockets are not network namespace aware. >> >> OK, this is the part I was unaware of. Bruce and I had talked about this a few months ago, and my take-away was that these sockets were network namespace-aware, such that this was supposed to work as we want, already. >> >>> They are "current->fs->root" aware. >> >> In other words, these are relative to the local file namespace, not the local network namespace. I was afraid of that. >> >>> I.e. if this connect operation would be performed on "sync mode" (i.e. from the context of mount of NFS server start operation), then all will works fine (in case of each container works in it's own root, of course). >>> But currently all connect operations are done by rpciod_workqueue. I.e. regardless to root of process, started service registration, unix socket will be always looked up by path "/var/run/rpcbind.sock" starting from rpciod_workqueue root. I can set desired root before kernel connect during handling RPC task by rpciod_workqueue and revert back to the old one after connection and this will solve the problem to. >>> But this approach looks like ugly hack to me. And also requires additional pointer in sock_xprt structure to bypass desired context to rpciod_workqueue handler. >> >> Can several network namespaces share the same file namespace? That might cause them to share the same rpcbind, which is undesirable. Might this also be a problem for user space? >> > > Yes, they can. But only in general. And it will be a problem for user space programs, using unix named sockets for network related stuff (like rpcbind, for instance). > But, actually, I don't see any sense in having several network namespaces with the same root. Probably someone can suggest a specific "real life" solution, which can use such scheme. I can't think of one either. > But it's not a container and thus no guarantees should be provided in this case from my pow. That's probably reasonable, and should be documented publicly if we take the approach of keeping a unique /var/run/rpcbind.sock for each network namespace. > Or we need to throw away this unix sockets approach and use only network namespace aware routines. But again, does this really required? /var/run/rpcbind.sock is a formal libtirpc/rpcbind API that is common to libtirpc on other OSes. Now, it's not likely that any application except the kernel uses it directly. Still, I'm leery of removing it entirely. My sense is that handing an fs->root value to the rpciod workqueue gives us behavior that is closest to what we have now in a single network namespace configuration. In other words, it's a change that introduces the least amount of "surprise" to the current RPC architecture. Do you have a patch so we can see just how ugly this might get? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html