Re: [PATCH v2 1/2] iomap: fix zero padding data issue in concurrent append writes

Brian Foster <bfoster@xxxxxxxxxx> · Fri, 15 Nov 2024 09:03:00 -0500

On Fri, Nov 15, 2024 at 07:01:58AM +1100, Dave Chinner wrote:
> On Thu, Nov 14, 2024 at 01:04:31PM -0500, Brian Foster wrote:
> > On Thu, Nov 14, 2024 at 10:34:26AM +0800, Long Li wrote:
> > > On Wed, Nov 13, 2024 at 11:13:49AM -0500, Brian Foster wrote:
> > ISTM that for the above merge scenario to happen we'd either need
> > writeback of the thread 1 write to race just right with the thread 2
> > write, or have two writeback cycles where the completion of the first is
> > still pending by the time the second completes. Either of those seem far
> > less likely than either writeback seeing i_size == 8k from the start, or
> > the thread 2 write completing sometime after the thread 1 ioend has
> > already been completed. Hm?
> 
> I think that this should be fairly easy to trigger with concurrent
> sub-block buffered writes to O_DSYNC|O_APPEND opened files. The fact
> we drop the IOLOCK before calling generic_write_sync() to flush the
> data pretty much guarantees that there will be IO in flight whilst
> other write() calls have extended the file in memory and are then
> waiting for the current writeback on the folio to complete before
> submitting their own writeback IO.
> 

Hmmm.. that kind of sounds like opposite to me. Note that the example
given was distinctly not append only, as that allows multiple
technically "merge-worthy" ioends to be in completion at the same time.

If you're doing O_DSYNC|O_APPEND sub-block buffered writes, then by
definition there is going to be folio overlap across writes, and we
don't submit a dirty&&writeback folio for writeback until preexisting
writeback state clears. So ISTM that creates a situation where even if
the append I/O is multithreaded, you'll just end up more likely to
lockstep across writebacks and won't ever submit the second ioend before
the first completes. Hm?

That said, we're kind of getting in the weeds here.. I don't think it
matters that much if these ioends merge..

I'm basically throwing out the proposition that maybe it's not worth the
additional code to ensure that they always do. I.e., suppose we opted to
trim the io_size when appropriate based on i_size so the completion side
size update is accurate, but otherwise just left off the rounding helper
thing and let the existing code work as it is. Would that lack of
guaranteed block alignment have any practical impact on functionality or
performance?

ISTM the add_to_ioend() part is superfluous so it probably wouldn't have
much effect on I/O sizes, if any. The skipped merge example Long Li
gives seems technically possible, but I'm not aware of a workload where
that would occur frequently enough that failing to merge such an ioend
would have a noticeable impact.. hm?

Again.. not something I care tremendously about, just trying to keep
things simple if possible. If it were me and there's not something we
can put to that obviously breaks, I'd start with that and then if that
does prove wrong, it's obviously simple to fix by tacking on the
rounding thing later. Just my .02.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
>