Re: [Lsf-pc] [LSF/MM TOPIC] I/O error handling and fsync()

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 23 Jan 2017 12:53:23 -0500

> On Jan 23, 2017, at 12:25 PM, Theodore Ts'o <tytso@xxxxxxx> wrote:
> 
> On Mon, Jan 23, 2017 at 07:10:00AM -0500, Jeff Layton wrote:
>>>> Well, except for QEMU/KVM, Kevin has already confirmed that using
>>>> Direct I/O is a completely viable solution.  (And I'll add it solves a
>>>> bunch of other problems, including page cache efficiency....)
>> 
>> Sure, O_DIRECT does make this simpler (though it's not always the most
>> efficient way to do I/O). I'm more interested in whether we can improve
>> the error handling with buffered I/O.
> 
> I just want to make sure we're designing a solution that will actually
> be _used_, because it is a good fit for at least one real-world use
> case.
> 
> Is QEMU/KVM using volumes that are stored over NFS really used in the
> real world?

Yes. NFS has worked well for many years in pre-cloud virtualization
environments; in other words, environments that have supported guest
migration for much longer than OpenStack has been around.

> Especially one where you want a huge amount of
> reliability and recovery after some kind network failure?

These are largely data center-grade machine room area networks, not
WANs. Network failures are not as frequent as they used to be.

Most server systems ship with more than one Ethernet device anyway.
Adding a second LAN path between each client and storage targets is
pretty straightforward.

> If we are
> talking about customers who are going to suspend the VM and restart it
> on another server, that presumes a fairly large installation size and
> enough servers that would they *really* want to use a single point of
> failure such as an NFS filer?

You certainly can make NFS more reliable by using a filer that supports
IP-based cluster failover, and has a reasonable amount of redundant
durable storage.

I don't see why we should presume anything about installation size.

> Even if it was a proprietary
> purpose-built NFS filer?  Why wouldn't they be using RADOS and Ceph
> instead, for example?

NFS is a fine inexpensive solution for small deployments and experimental
set ups.

It's much simpler for a single user with no administrative rights to
manage NFS-based files than to deal with creating LUNs or backing
objects, for instance.

Considering the various weirdnesses and inefficiencies involved in
turning an object store into something that has proper POSIX file
semantics, IMO NFS is a known quantity that is straightforward
and a natural fit for some cloud deployments. If it wan't, then
there would be no reason to provide object-to-NFS gateway services.

Wrt O_DIRECT, an NFS client can open the NFS file that backs a virtual
block device with O_DIRECT, and you get the same semantics as reading
or writing to a physical block device. There is no need for the server
to use O_DIRECT as well: the client uses the NFS protocol to control
when the server commits data to durable storage (like, immediately).

--
Chuck Lever

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>