Re: [Lsf-pc] [LSF/MM TOPIC] I/O error handling and fsync()

Jeff Layton <jlayton@xxxxxxxxxxxxxxx> · Mon, 23 Jan 2017 17:40:19 -0500

On Mon, 2017-01-23 at 12:25 -0500, Theodore Ts'o wrote:
> On Mon, Jan 23, 2017 at 07:10:00AM -0500, Jeff Layton wrote:
> > > > Well, except for QEMU/KVM, Kevin has already confirmed that using
> > > > Direct I/O is a completely viable solution.  (And I'll add it solves a
> > > > bunch of other problems, including page cache efficiency....)
> > 
> > Sure, O_DIRECT does make this simpler (though it's not always the most
> > efficient way to do I/O). I'm more interested in whether we can improve
> > the error handling with buffered I/O.
> 
> I just want to make sure we're designing a solution that will actually
> be _used_, because it is a good fit for at least one real-world use
> case.
> 

Exactly. Asking how the QEMU folks would like to be able to interact
with the kernel is not the same as promising to implement said solution.
Still, I think it's a valid question, and I'll pose it in terms of NFS
though I think the semantics apply to other situations as well.

I'm mostly just asking to get a better idea of what the KVM folks would
really like to have happen in this situation. I don't think they want to
error out on every network blip, but in the face of a hung mount that
isn't making progress in writeback, what would they like to be able to
do to resolve it?

For instance, With NFS you can generally send a SIGKILL to the process
to make it abandon O_DIRECT writes. But, tasks accessing NFS mounts
still seem to get stuck in buffered writeback if the server goes away,
generally waiting on the page bits to clear in uninterruptible sleeps.

Would better handling of SIGKILL when waiting on buffered writeback be
what QEMU devs would like? That seems like a reasonable thing to
consider.

> Is QEMU/KVM using volumes that are stored over NFS really used in the
real world?  Especially one where you want a huge amount of
reliability and recovery after some kind network failure?  If we are
talking about customers who are going to suspend the VM and restart it
on another server, that presumes a fairly large installation size and
enough servers that would they *really* want to use a single point of
failure such as an NFS filer?  Even if it was a proprietary
> purpose-built NFS filer?  Why wouldn't they be using RADOS and Cephinstead, for example?

Nothing specific about NFS in what I was asking. I think cephfs has
similar behavior in the face of the client not being able to reach any
of its MDS', for instance.

-- 
Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html