> On Jan 23, 2017, at 12:25 PM, Theodore Ts'o <tytso@xxxxxxx> wrote: > > On Mon, Jan 23, 2017 at 07:10:00AM -0500, Jeff Layton wrote: >>>> Well, except for QEMU/KVM, Kevin has already confirmed that using >>>> Direct I/O is a completely viable solution. (And I'll add it solves a >>>> bunch of other problems, including page cache efficiency....) >> >> Sure, O_DIRECT does make this simpler (though it's not always the most >> efficient way to do I/O). I'm more interested in whether we can improve >> the error handling with buffered I/O. > > I just want to make sure we're designing a solution that will actually > be _used_, because it is a good fit for at least one real-world use > case. > > Is QEMU/KVM using volumes that are stored over NFS really used in the > real world? Yes. NFS has worked well for many years in pre-cloud virtualization environments; in other words, environments that have supported guest migration for much longer than OpenStack has been around. > Especially one where you want a huge amount of > reliability and recovery after some kind network failure? These are largely data center-grade machine room area networks, not WANs. Network failures are not as frequent as they used to be. Most server systems ship with more than one Ethernet device anyway. Adding a second LAN path between each client and storage targets is pretty straightforward. > If we are > talking about customers who are going to suspend the VM and restart it > on another server, that presumes a fairly large installation size and > enough servers that would they *really* want to use a single point of > failure such as an NFS filer? You certainly can make NFS more reliable by using a filer that supports IP-based cluster failover, and has a reasonable amount of redundant durable storage. I don't see why we should presume anything about installation size. > Even if it was a proprietary > purpose-built NFS filer? Why wouldn't they be using RADOS and Ceph > instead, for example? NFS is a fine inexpensive solution for small deployments and experimental set ups. It's much simpler for a single user with no administrative rights to manage NFS-based files than to deal with creating LUNs or backing objects, for instance. Considering the various weirdnesses and inefficiencies involved in turning an object store into something that has proper POSIX file semantics, IMO NFS is a known quantity that is straightforward and a natural fit for some cloud deployments. If it wan't, then there would be no reason to provide object-to-NFS gateway services. Wrt O_DIRECT, an NFS client can open the NFS file that backs a virtual block device with O_DIRECT, and you get the same semantics as reading or writing to a physical block device. There is no need for the server to use O_DIRECT as well: the client uses the NFS protocol to control when the server commits data to durable storage (like, immediately). -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html