Re: [RFC 2/3] ext2: Convert ext2 regular file buffered I/O to use iomap

Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> · Tue, 21 Nov 2023 11:26:15 +0530

Christoph Hellwig <hch@xxxxxxxxxxxxx> writes:

> On Tue, Nov 21, 2023 at 12:35:20AM +0530, Ritesh Harjani (IBM) wrote:
>> - mmap path of ext2 continues to use generic_file_vm_ops
>
> So this means there are not space reservations taken at write fault
> time.

Yes.

> While iomap was written with the assumption those exist, I can't
> instantly spot anything that relies on them - you are just a lot more
> likely to hit an -ENOSPC from ->map_blocks now.

Which is also true with existing code no? If the block reservation is
not done at the write fault, writeback is likely to fail due to ENOSPC?

> Maybe we should add
> an xfstests that covers the case where we use up more then the existing
> free space through writable mmaps to ensure it fully works (where works
> means at least as good as the old behavior)?
>

Sure, I can try write an fstests for the same.

Also I did find an old thread which tried implementing ->page_mkwrite in
ext2 [1]. However, it was rejected with following reason - 

"""
Allocating
blocks on write fault has the unwelcome impact that the file layout is
likely going to be much worse (much more fragmented) - I remember getting
some reports about performance regressions from users back when I did a
similar change for ext3. And so I'm reluctant to change behavior of such
an old legacy filesystem as ext2...
"""

https://lore.kernel.org/linux-ext4/20210105175348.GE15080@xxxxxxxxxxxxxx/

>> +static ssize_t ext2_buffered_write_iter(struct kiocb *iocb,
>> +					struct iov_iter *from)
>> +{
>> +	ssize_t ret = 0;
>> +	struct inode *inode = file_inode(iocb->ki_filp);
>> +
>> +	inode_lock(inode);
>> +	ret = generic_write_checks(iocb, from);
>> +	if (ret > 0)
>> +		ret = iomap_file_buffered_write(iocb, from, &ext2_iomap_ops);
>> +	inode_unlock(inode);
>> +	if (ret > 0)
>> +		ret = generic_write_sync(iocb, ret);
>> +	return ret;
>> +}
>
> Not for now, but if we end up doing a bunch of conversation of trivial
> file systems, it might be worth to eventually add a wrapper for the
> above code with just the iomap_ops passed in.  Only once we see a few
> duplicates, though.
>

Sure, make sense. Thanks!
I can try and check if the the wrapper helps.

>> +static int ext2_write_map_blocks(struct iomap_writepage_ctx *wpc,
>> +				 struct inode *inode, loff_t offset)
>> +{
>> +	if (offset >= wpc->iomap.offset &&
>> +	    offset < wpc->iomap.offset + wpc->iomap.length)
>> +		return 0;
>> +
>> +	return ext2_iomap_begin(inode, offset, inode->i_sb->s_blocksize,
>> +				IOMAP_WRITE, &wpc->iomap, NULL);
>> +}
>
> Looking at gfs2 this also might become a pattern.  But I'm fine with
> waiting for now.
>

ok.

>> @@ -1372,7 +1428,7 @@ void ext2_set_file_ops(struct inode *inode)
>>  	if (IS_DAX(inode))
>>  		inode->i_mapping->a_ops = &ext2_dax_aops;
>>  	else
>> -		inode->i_mapping->a_ops = &ext2_aops;
>> +		inode->i_mapping->a_ops = &ext2_file_aops;
>>  }
>
> Did yo run into issues in using the iomap based aops for the other uses
> of ext2_aops, or are just trying to address the users one at a time?

There are problems for e.g. for dir type in ext2. It uses the pagecache
for dir. It uses buffer_heads and attaches them to folio->private.
    ...it uses block_write_begin/block_write_end() calls.
    Look for ext4_make_empty() -> ext4_prepare_chunk ->
    block_write_begin(). 
Now during sync/writeback of the dirty pages (ext4_handle_dirsync()), we
might take a iomap writeback path (if using ext2_file_aops for dir)
which sees folio->private assuming it is "struct iomap_folio_state".
And bad things will happen... 

Now we don't have an equivalent APIs in iomap for
block_write_begin()/end() which the users can call for. Hence, Jan
suggested to lets first convert ext2 regular file path to iomap as an RFC.
I shall now check for the dir type to see what changes we might require
in ext2/iomap code.

Thanks again for your review!

-ritesh