Re: [RFC PATCH 1/1] Remove buffered failover for ext4 and block fops direct writes.

"Theodore Ts'o" <tytso@xxxxxxx> · Thu, 2 May 2024 10:01:39 -0400

On Wed, May 01, 2024 at 10:45:06PM -0700, Christoph Hellwig wrote:
> 
> Please don't combine ext4 and block changes in a single patch.  Please
> also explain why you want to change things.
> 
> AFAIK this is simply the historic behavior of the old direct I/O code
> that's been around forever.  I think the XFS semantics make a lot more
> sense, but people might rely on this one way or another.

I agree that the ext4 and block I/O change should be split into two
separate patches.

As for the rest, we discussed this at the weekly ext4 conference call
last week and at the, I had indicated that this was indeed the
historical Direct I/O behavior.  Darrick mentioned that XFS is only
falling back to buffered I/O in one circumstances, which is when there
is direct I/O to a file which is reflinked, which since the
application wouldn't know that this might be the case, falling back to
buffered I/O was the best of not-so-great alternatives.

It might be a good idea if we could agree on a unfied set of standard
semantics for Direct I/O, including what should happen if there is an
I/O error in the middle of a DIO request; should the kernel return a
short write?  Should it silently fallback to buffered I/O?  Given that
XFS has had a fairly strict "never fall back to buffered" practice,
and there haven't been users screaming bloody murder, perhaps it is
time that we can leave the old historical Direct I/O semantics behind,
and we should just be more strict.

Ext4 can make a decision about what to do on its own, but if we want
to unify behavior across all file systems and all of the direct I/O
implications in the kernels, then this is a discussion that would need
to take place on linux-fsdevel, linux-block, and/or LSF/MM.

With that context, what are folks' thiking about the proposal that we
unify Linux's Direct I/O semantics?  I think it would be good if it
was (a) clearly documented, and (b) not be surprising for userspace
application which they switch beteween file systems, or between a file
system and a raw block device.  (Which for certain enterprise
database, is mostly only use for benchmarketing, on the back cover of
Business Week, but sometimes there might be users who decide to
squeeze that last 1% of performance by going to a raw block device,
and it might be nice if they see the same behaviour when they make
that change.)

Cheers,

					- Ted