Re: [PATCH v8 06/14] iomap: Return -EAGAIN from iomap_write_iter()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/8/22 12:02 PM, Matthew Wilcox wrote:
> On Wed, Jun 08, 2022 at 10:17:33AM -0700, Stefan Roesch wrote:
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index b06a5c24a4db..f701dcb7c26a 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -829,7 +829,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>>  		length -= status;
>>  	} while (iov_iter_count(i) && length);
>>  
>> -	return written ? written : status;
>> +	if (status == -EAGAIN) {
>> +		iov_iter_revert(i, written);
>> +		return -EAGAIN;
>> +	}
>> +	if (written)
>> +		return written;
>> +	return status;
>>  }
> 
> I still don't understand how this can possibly work.  Walk me through it.
> 
> Let's imagine we have a file laid out such that extent 1 is bytes
> 0-4095 of the file and extent 2 is extent 4096-16385 of the file.
> We do a write of 5000 bytes starting at offset 4000 of the file.
> 
> iomap_iter() tells us about the first extent and we write the first
> 96 bytes of our data to the first extent, returning 96.  iomap_iter()
> tells us about the second extent, and we write the next 4000 bytes to
> the second extent.  Then we get a page fault and get to the -EAGAIN case.
> We rewind the iter 4000 bytes.
> 

We have two data structures, the iomap_iter and iov_iter. After the first
96 bytes, the iov_iter offset get updated in iomap_write_iter() and then the
iomap_iter pos gets updated in iomap_iter()->iomap_iter_advance().

We then get the second extend from iomap_iter(). In iomap_write_iter() the
first page is obtained and written successfully, then the second page is
faulted. At this point the iov offset of the iov_iter has advanced. To reset
it to the state when the function iomap_write_iter() was entered, the iov_iter
is reset to iov_offset - written bytes.

iomap_write_iter() is exited and returns -EAGAIN. As iomap_write_iter() returns
an error, the iomap_iter pos is not updated in iomap_iter(). Only the number
of bytes written in the write of the first extent from iomap_file_buffered_write()
is returned from iomap_file_buffered_write().

In xfs_file_buffered_write we updated the iocb->ki_pos with the number of
bytes written. In io-uring, the io_write() call receives the short write result.
It copies the iov_iter struct into the work context for the io worker.
The io_worker uses that information to complete the rest of the write.

The above reset is required to keep the pos in iomap_iter and the offset in
iov_iter in sync.



Side Note:

I had an earlier version of the patch that was changing the signature of the
function iomap_write_iter(). It was returning a return code and changing the
processed value of the iomap_iter (which then also changes the pos value of 
the iomap_iter). This version (version 7 of the patch) does not require to
reset the offset of the iov_iter. It can update the pos in iomap_iter even
when -EAGAIN is returned.



> How do we not end up writing garbage when the kworker does the retry?
> I'd understand if we rewound the iter all the way to the start.  Or if
> we didn't rewind the iter at all and were able to pick up partway through
> the write.  But rewinding to the start of the extent feels like it can't
> possibly work.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux