On Wed, Jun 08, 2022 at 10:17:33AM -0700, Stefan Roesch wrote: > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index b06a5c24a4db..f701dcb7c26a 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -829,7 +829,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > length -= status; > } while (iov_iter_count(i) && length); > > - return written ? written : status; > + if (status == -EAGAIN) { > + iov_iter_revert(i, written); > + return -EAGAIN; > + } > + if (written) > + return written; > + return status; > } I still don't understand how this can possibly work. Walk me through it. Let's imagine we have a file laid out such that extent 1 is bytes 0-4095 of the file and extent 2 is extent 4096-16385 of the file. We do a write of 5000 bytes starting at offset 4000 of the file. iomap_iter() tells us about the first extent and we write the first 96 bytes of our data to the first extent, returning 96. iomap_iter() tells us about the second extent, and we write the next 4000 bytes to the second extent. Then we get a page fault and get to the -EAGAIN case. We rewind the iter 4000 bytes. How do we not end up writing garbage when the kworker does the retry? I'd understand if we rewound the iter all the way to the start. Or if we didn't rewind the iter at all and were able to pick up partway through the write. But rewinding to the start of the extent feels like it can't possibly work.