Re: [PATCH v5 3/7] fs: iomap: Atomic write support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 31/08/2024 00:56, Darrick J. Wong wrote:
IOWs, although the*disk write* was completed successfully, the mapping
updates were torn, and the user program sees a torn write.
The most performant/painful way to fix this would be to make the whole
ioend completion a logged operation so that we could commit to updating
all the unwritten mappings and restart it after a crash.
could we make it logged for those special cases which we are interested in
only?
Yes, though this is the long route -- you get to define a new ondisk log
item, build all the incore structures to process them, and then define a
new high level operation that uses the state encoded in that new log
item to run all the ioend completion transactions within that framework.
Also you get to add a new log incompat feature bit for this.

Perhaps we should analyze the cost of writing and QA'ing all that vs.
the amount of time saved in the handling of this corner case using one
of the less exciting options.

From the sound of all the changes required, I am not too keen on that option...


The least performant of course is to write zeroes at allocation time,
like we do for fsdax.
That idea was already proposed:
https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/ ZcGIPlNCkL6EDx3Z@xxxxxxxxxxxxxxxxxxx/__;!!ACWV5N9M2RV99hQ! Kmx2Rrrot3GTqBS3kwhTi1nxIrpiPDyiy3TEfowsRKonvY90W7o4xUv9r9seOfDAMa2gT- TCNVlpH-CGXA$
Yes, I'm aware.

A possible middle ground would be to detect IOMAP_ATOMIC in the
->iomap_begin method, notice that there are mixed mappings under the
proposed untorn IO, and pre-convert the unwritten blocks by writing
zeroes to disk and updating the mappings
Won't that have the same issue as using XFS_BMAPI_ZERO, above i.e. zeroing
during allocation?
Only if you set the forcealign size to > 1fsb and fail to write new
file data in forcealign units, even for non-untorn writes.  If all
writes to the file are aligned to the forcealign size then there's only
one extent conversion to be done, and that cannot be torn.
> >>> before handing the one single
mapping back to iomap_dio_rw to stage the untorn writes bio.  At least
you'd only be suffering that penalty for the (probable) corner case of
someone creating mixed mappings.
BTW, one issue I have with the sub-extent(or -alloc unit) zeroing from v4
series is how the unwritten conversion has changed, like:

xfs_iomap_write_unwritten()
{
	unsigned int rounding;

	/* when converting anything unwritten, we must be spanning an alloc unit,
so round up/down */
	if (rounding > 1) {
		offset_fsb = rounddown(rounding);
		count_fsb = roundup(rounding);
	}

	...
	do {
		xfs_bmapi_write();
		...
		xfs_trans_commit();
	} while ();
}

I'm not too happy with it and it seems a bit of a bodge, as I would rather
we report the complete size written (user data and zeroes); then
xfs_iomap_write_unwritten() would do proper individual block conversion.
However, we do something similar for zeroing for sub-FSB writes. I am not
sure if that is the same thing really, as we only round up to FSB size.
Opinion?
xfs_iomap_write_unwritten is in the ioend path; that's not what I was
talking about.

Sure, it's not the same as what you are talking about, but I am just mentioning it as it was included in my sub-FS extent zeroing solution and I am not too happy about it. It's just a concern there.


I'm talking about a separate change to the xfs_direct_write_iomap_begin
function that would detect the case where the bmapi_read returns an
@imap that doesn't span the whole forcealign region, then repeatedly
calls bmapi_write(BMAPI_ZERO | BMAPI_CONVERT) on any unwritten mappings
within that file range until the original bmapi_read would return a
single written mapping.

Right, I get the idea. I'll check it further.

Cheers,
John






[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux