On Thu, Dec 05, 2024 at 07:35:45AM +1100, Dave Chinner wrote: > On Wed, Dec 04, 2024 at 03:43:41PM +0000, John Garry wrote: > > From: "Ritesh Harjani (IBM)" <ritesh.list@xxxxxxxxx> > > > > Filesystems like ext4 can submit writes in multiples of blocksizes. > > But we still can't allow the writes to be split into multiple BIOs. Hence > > let's check if the iomap_length() is same as iter->len or not. > > > > It is the responsibility of userspace to ensure that a write does not span > > mixed unwritten and mapped extents (which would lead to multiple BIOs). > > How is "userspace" supposed to do this? > > No existing utility in userspace is aware of atomic write limits or > rtextsize configs, so how does "userspace" ensure everything is > laid out in a manner compatible with atomic writes? > > e.g. restoring a backup (or other disaster recovery procedures) is > going to have to lay the files out correctly for atomic writes. > backup tools often sparsify the data set and so what gets restored > will not have the same layout as the original data set... > > Where's the documentation that outlines all the restrictions on > userspace behaviour to prevent this sort of problem being triggered? > Common operations such as truncate, hole punch, buffered writes, > reflinks, etc will trip over this, so application developers, users > and admins really need to know what they should be doing to avoid > stepping on this landmine... I'm kinda assuming that this requires forcealign to get the extent alignments correct, and writing zeroes non-atomically if the extent state gets mixed up before retrying the untorn write. John? > Further to that, what is the correction process for users to get rid > of this error? They'll need some help from an atomic write > constraint aware utility that can resilver the file such that these > failures do not occur again. Can xfs_fsr do this? Or maybe the new > exchangr-range code? Or does none of this infrastructure yet exist? TBH aside from databases doing pure overwrites to storage hardware, I think everyone else would be better served by start_commit / commit_range since it's /much/ more flexible. > > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> > > jpg: tweak commit message > > Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx> > > --- > > fs/iomap/direct-io.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > index b521eb15759e..3dd883dd77d2 100644 > > --- a/fs/iomap/direct-io.c > > +++ b/fs/iomap/direct-io.c > > @@ -306,7 +306,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > size_t copied = 0; > > size_t orig_count; > > > > - if (atomic && length != fs_block_size) > > + if (atomic && length != iter->len) > > return -EINVAL; > > Given this is now rejecting IOs that are otherwise well formed from > userspace, this situation needs to have a different errno now. The > userspace application has not provided an invalid set of > IO parameters for this IO - the IO has been rejected because > the previously set up persistent file layout was screwed up by > something in the past. > > i.e. This is not an application IO submission error anymore, hence > EINVAL is the wrong error to be returning to userspace here. Admittedly it would be useful to add at least a couple of new errnos for alignment issues and incorrect file storage mapping states. How difficult is it to get a new errno code added to uapi and then plumbed into glibc? Are new errno additions to libc still gate-kept by Ulrich or is my knowlege 15 years out of date? --D > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >