Re: [PATCH 1/4] iomap: Lift blocksize restriction on atomic writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 05, 2024 at 07:35:45AM +1100, Dave Chinner wrote:
> On Wed, Dec 04, 2024 at 03:43:41PM +0000, John Garry wrote:
> > From: "Ritesh Harjani (IBM)" <ritesh.list@xxxxxxxxx>
> > 
> > Filesystems like ext4 can submit writes in multiples of blocksizes.
> > But we still can't allow the writes to be split into multiple BIOs. Hence
> > let's check if the iomap_length() is same as iter->len or not.
> > 
> > It is the responsibility of userspace to ensure that a write does not span
> > mixed unwritten and mapped extents (which would lead to multiple BIOs).
> 
> How is "userspace" supposed to do this?
> 
> No existing utility in userspace is aware of atomic write limits or
> rtextsize configs, so how does "userspace" ensure everything is
> laid out in a manner compatible with atomic writes?
> 
> e.g. restoring a backup (or other disaster recovery procedures) is
> going to have to lay the files out correctly for atomic writes.
> backup tools often sparsify the data set and so what gets restored
> will not have the same layout as the original data set...
> 
> Where's the documentation that outlines all the restrictions on
> userspace behaviour to prevent this sort of problem being triggered?
> Common operations such as truncate, hole punch, buffered writes,
> reflinks, etc will trip over this, so application developers, users
> and admins really need to know what they should be doing to avoid
> stepping on this landmine...

I'm kinda assuming that this requires forcealign to get the extent
alignments correct, and writing zeroes non-atomically if the extent
state gets mixed up before retrying the untorn write.  John?

> Further to that, what is the correction process for users to get rid
> of this error? They'll need some help from an atomic write
> constraint aware utility that can resilver the file such that these
> failures do not occur again. Can xfs_fsr do this? Or maybe the new
> exchangr-range code? Or does none of this infrastructure yet exist?

TBH aside from databases doing pure overwrites to storage hardware, I
think everyone else would be better served by start_commit /
commit_range since it's /much/ more flexible.

> > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx>
> > jpg: tweak commit message
> > Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
> > ---
> >  fs/iomap/direct-io.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index b521eb15759e..3dd883dd77d2 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -306,7 +306,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> >  	size_t copied = 0;
> >  	size_t orig_count;
> >  
> > -	if (atomic && length != fs_block_size)
> > +	if (atomic && length != iter->len)
> >  		return -EINVAL;
> 
> Given this is now rejecting IOs that are otherwise well formed from
> userspace, this situation needs to have a different errno now. The
> userspace application has not provided an invalid set of
> IO parameters for this IO - the IO has been rejected because
> the previously set up persistent file layout was screwed up by
> something in the past.
> 
> i.e. This is not an application IO submission error anymore, hence
> EINVAL is the wrong error to be returning to userspace here.

Admittedly it would be useful to add at least a couple of new errnos for
alignment issues and incorrect file storage mapping states.  How
difficult is it to get a new errno code added to uapi and then plumbed
into glibc?  Are new errno additions to libc still gate-kept by Ulrich
or is my knowlege 15 years out of date?

--D

> -Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux