On Thu, May 18, 2017 at 03:04:50PM +0000, Trond Myklebust wrote: > On Thu, 2017-05-18 at 10:28 -0400, Chuck Lever wrote: > > > On May 18, 2017, at 9:34 AM, Stefan Hajnoczi <stefanha@xxxxxxxxxx> > > > wrote: > > > > > > On Tue, May 16, 2017 at 09:11:42AM -0400, J. Bruce Fields wrote: > > > > I think you explained this before, perhaps you could just offer a > > > > pointer: remind us what your requirements or use cases are > > > > especially > > > > for VM migration? > > > > > > The NFS over AF_VSOCK configuration is: > > > > > > A guest running on host mounts an NFS export from the host. The > > > NFS > > > server may be kernel nfsd or an NFS frontend to a distributed > > > storage > > > system like Ceph. A little more about these cases below. > > > > > > Kernel nfsd is useful for sharing files. For example, the guest > > > may > > > read some files from the host when it launches and/or it may write > > > out > > > result files to the host when it shuts down. The user may also > > > wish to > > > share their home directory between the guest and the host. > > > > > > NFS frontends are a different use case. They hide distributed > > > storage > > > systems from guests in cloud environments. This way guests don't > > > see > > > the details of the Ceph, Gluster, etc nodes. Besides benefiting > > > security it also allows NFS-capable guests to run without > > > installing > > > specific drivers for the distributed storage system. This use case > > > is > > > "filesystem as a service". > > > > > > The reason for using AF_VSOCK instead of TCP/IP is that traditional > > > networking configuration is fragile. Automatically adding a > > > dedicated > > > NIC to the guest and choosing an IP subnet has a high chance of > > > conflicts (subnet collisions, network interface naming, firewall > > > rules, > > > network management tools). AF_VSOCK is a zero-configuration > > > communications channel so it avoids these problems. > > > > > > On to migration. For the most part, guests can be live migrated > > > between > > > hosts without significant downtime or manual steps. PCI > > > passthrough is > > > an example of a feature that makes it very hard to live migrate. I > > > hope > > > we can allow migration with NFS, although some limitations may be > > > necessary to make it feasible. > > > > > > There are two NFS over AF_VSOCK migration scenarios: > > > > > > 1. The files live on host H1 and host H2 cannot access the files > > > directly. There is no way for an NFS server on H2 to access > > > those > > > same files unless the directory is copied along with the guest or > > > H2 > > > proxies to the NFS server on H1. > > > > Having managed (and shared) storage on the physical host is > > awkward. I know some cloud providers might do this today by > > copying guest disk images down to the host's local disk, but > > generally it's not a flexible primary deployment choice. > > > > There's no good way to expand or replicate this pool of > > storage. A backup scheme would need to access all physical > > hosts. And the files are visible only on specific hosts. > > > > IMO you want to treat local storage on each physical host as > > a cache tier rather than as a back-end tier. > > > > > > > 2. The files are accessible from both host H1 and host H2 because > > > they > > > are on shared storage or distributed storage system. Here the > > > problem is "just" migrating the state from H1's NFS server to H2 > > > so > > > that file handles remain valid. > > > > Essentially this is the re-export case, and this makes a lot > > more sense to me from a storage administration point of view. > > > > The pool of administered storage is not local to the physical > > hosts running the guests, which is how I think cloud providers > > would prefer to operate. > > > > User storage would be accessible via an NFS share, but managed > > in a Ceph object (with redundancy, a common high throughput > > backup facility, and secure central management of user > > identities). > > > > Each host's NFS server could be configured to expose only the > > the cloud storage resources for the tenants on that host. The > > back-end storage (ie, Ceph) could operate on a private storage > > area network for better security. > > > > The only missing piece here is support in Linux-based NFS > > servers for transparent state migration. > > Not really. In a containerised world, we're going to see more and more > cases where just a single process/application gets migrated from one > NFS client to another (and yes, a re-exporter/proxy of NFS is just > another client as far as the original server is concerned). > IOW: I think we want to allow a client to migrate some parts of its > lock state to another client, without necessarily requiring every > process being migrated to have its own clientid. It wouldn't have to be every process, it'd be every container, right? What's the disadvantage of per-container clientids? I guess you lose the chance to share delegations and caches. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html