On Thu, 3 Sep 2015 16:20:16 -0400 "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote: > On Thu, Sep 03, 2015 at 03:54:17PM -0400, Jeff Layton wrote: > > On Thu, 3 Sep 2015 15:19:14 -0400 > > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote: > > > > > On Thu, Sep 03, 2015 at 02:52:25PM -0400, Jeff Layton wrote: > > > > On Thu, 3 Sep 2015 14:43:27 -0400 > > > > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote: > > > > > > > > > On Thu, Sep 03, 2015 at 01:33:14PM -0400, Jeff Layton wrote: > > > > > > There are cases with NFSv3 where the client doesn't actually care about > > > > > > WCC attributes in replies. If the server is mainly acting as a DS for > > > > > > flexfiles, then the client just throws out those attributes anyway. > > > > > > Also, in the case where the client is primarily doing direct I/O, post > > > > > > op attributes aren't terribly useful > > > > > > > > > > > > Another reason to allow turning these off is that NFS will flush all > > > > > > buffered writes prior to issuing a GETATTR, and it also takes the > > > > > > i_mutex in its ->getattr operation. > > > > > > > > > > > > If we're doing a vfs_getattr after most RPCs, then we can end up > > > > > > deadlocking or (at best) prematurely flushing buffered writes, which > > > > > > kills performance. > > > > > > > > > > So you're talking about the NFS re-export case? Do we know of any other > > > > > case when a ->getattr is so expensive? > > > > > > > > > > > > > That's the main one that I have experience with, but getattr can be > > > > pretty expensive in clustered filesystems. For instance, on ceph: > > > > > > > > err = ceph_do_getattr(inode, CEPH_STAT_CAP_INODE_ALL, false); > > > > if (!err) { > > > > generic_fillattr(inode, stat); > > > > stat->ino = ceph_translate_ino(inode->i_sb, inode->i_ino); > > > > > > > > > > > > ...and it looks like ceph_do_getattr can issue a request on the network > > > > (though I'm not familiar with that code and I imagine that it's > > > > sometimes optimized out). > > > > > > OK. Could we get something like that in the changelog? The change > > > really needs to stand on the non-NFS case alone as long as NFS > > > reexport's not upstream. For that reason (and because without the > > > context that second paragraph's kind of confusing), it'd be helpful to > > > preface the NFS discussion by smoething like "In the (out-of-tree) NFS > > > re-export case". > > > > > > > Yeah, no problem. I'll respin the changelog on both patches and resend > > within the next day or two. > > > > > What's keeping that out of upstream, anyway? Apparently there's some > > > use case, and if it's inspiring a lot of changes in generic code, then > > > it'd simplify life to have it upstream. > > > > > > > > > > There are several problems. Here are few but there are others: > > > > 1) it is at least somewhat of a potential security concern. By mounting > > on a box that has access and then reexporting it, you can circumvent > > the export restrictions on the original server. Granted you can do that > > today with samba or something, but still -- it's a little sketchy. > > Or with ganesha, or you could run a web server, or just mail the file > contents to someone.... This just isn't the way to enforce anything. > > I've hard Trond argue something like this before, but I think his point > was a little different: not that we have to deny reexports for security > reasons, but just that such a policy-circumventing use case isn't worth > supporting. So he wasn't interested in reexports as long as that was > the only use case he'd heard about. > Agreed. We can't really enforce anything with this, but it's not a use case we really want to be an enabler of. > > 2) getattrs: We're working around the problem with this new export > > option, but if you don't use that then you can potentially deadlock > > with NFS. It wants to take the i_mutex in its ->getattr operation but > > knfsd calls vfs_getattr with that held to do post-op attrs. My initial > > workaround was to drop the i_mutex before calling fh_getattr instead of > > after, but then I hit the performance problem I described. > > > > 3) locking: proxying v3 locking is a painful mess. If the reexporter > > reboots, it'll lose its lease on the main server, which will kick out > > all of its state. At that point you can end up with another client > > racing and getting your lock before the reexporter can come back up and > > reclaim it. > > > > Our main use-case for this is pretty limited and doesn't involve file > > locking (so far!). > > So this is the interest part, I guess.--b. > Yeah. The locking one is a real bugger. We have a potential design for a solution, but I'm not sure it'll be something we can open source. -- Jeff Layton <jlayton@xxxxxxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html