On Mon, 2017-01-23 at 11:09 +0100, Kevin Wolf wrote: > Am 23.01.2017 um 01:21 hat Theodore Ts'o geschrieben: > > On Sun, Jan 22, 2017 at 06:31:57PM -0500, Jeff Layton wrote: > > > > > > Ahh, sorry if I wasn't clear. > > > > > > I know Kevin posed this topic in the context of QEMU/KVM, and I figure > > > that running virt guests (themselves doing all sorts of workloads) is a > > > pretty common setup these days. That was what I meant by "use case" > > > here. Obviously there are many other workloads that could benefit from > > > (or be harmed by) changes in this area. > > > > > > Still, I think that looking at QEMU/KVM as a "application" and > > > considering what we can do to help optimize that case could be helpful > > > here (and might also be helpful for other workloads). > > > > Well, except for QEMU/KVM, Kevin has already confirmed that using > > Direct I/O is a completely viable solution. (And I'll add it solves a > > bunch of other problems, including page cache efficiency....) > > Yes, "don't ever use non-O_DIRECT in production" is probably workable as > a solution to the "state after failed fsync()" problem, as long as it is > consistently implemented throughout the stack. That is, if we use a > network protocol in QEMU (NFS, gluster, etc.), the server needs to use > O_DIRECT, too, if we don't want to get the same problem one level down > the stack. I'm not sure if that's possible with all of them, but if it > is, it's mostly just a matter of configuring them correctly. > It's actually not necessary with NFS. O_DIRECT I/O is entirely a client- side thing. There's no support for it in the protocol (and there doesn't really need to be). If something happens and the server crashed before the writes were stable, then I believe the client will reissue them. If both the client and server crash at the same time, then all bets are off of course. :) > However, if we look at the greater problem of hanging requests that came > up in the more recent emails of this thread, it is only moved rather > than solved. Chances are that already write() would hang now instead of > only fsync(), but we still have a hard time dealing with this. > Well, it _is_ better with O_DIRECT as you can usually at least break out of the I/O with SIGKILL. When I last looked at this, the problem with buffered I/O was that you often end up waiting on page bits to clear (usually PG_writeback or PG_dirty), in non-killable sleeps for the most part. Maybe the fix here is as simple as changing that? -- Jeff Layton <jlayton@xxxxxxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html