Re: [PATCH 01/27] ext4: remove writable userspace mappings before truncating page cache

Jan Kara <jack@xxxxxxx> · Fri, 6 Dec 2024 16:49:38 +0100

On Fri 06-12-24 15:59:44, Zhang Yi wrote:
> On 2024/12/4 19:13, Jan Kara wrote:
> > On Tue 22-10-24 19:10:32, Zhang Yi wrote:
> >> +static inline void ext4_truncate_folio(struct inode *inode,
> >> +				       loff_t start, loff_t end)
> >> +{
> >> +	unsigned long blocksize = i_blocksize(inode);
> >> +	struct folio *folio;
> >> +
> >> +	if (round_up(start, blocksize) >= round_down(end, blocksize))
> >> +		return;
> >> +
> >> +	folio = filemap_lock_folio(inode->i_mapping, start >> PAGE_SHIFT);
> >> +	if (IS_ERR(folio))
> >> +		return;
> >> +
> >> +	if (folio_mkclean(folio))
> >> +		folio_mark_dirty(folio);
> >> +	folio_unlock(folio);
> >> +	folio_put(folio);
> > 
> > I don't think this is enough. In your example from the changelog, this would
> > leave the page at index 0 dirty and still with 0x5a values in 2048-4096 range.
> > Then truncate_pagecache_range() does nothing, ext4_alloc_file_blocks()
> > converts blocks under 2048-4096 to unwritten state. But what handles
> > zeroing of page cache in 2048-4096 range? ext4_zero_partial_blocks() zeroes
> > only partial blocks, not full blocks. Am I missing something?
> > 
> 
> Sorry, I don't understand why truncate_pagecache_range() does nothing? In my
> example, the variable 'start' is 2048, the variable 'end' is 4096, and the
> call process truncate_pagecache_range(inode, 2048, 4096-1)->..->
> truncate_inode_partial_folio()->folio_zero_range() does zeroing the 2048-4096
> range. I also tested it below, it was zeroed.
> 
>   xfs_io -t -f -c "pwrite -S 0x58 0 4096" -c "mmap -rw 0 4096" \
>                -c "mwrite -S 0x5a 2048 2048" \
>                -c "fzero 2048 2048" -c "close" /mnt/foo
> 
>   od -Ax -t x1z /mnt/foo
>   000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58  >XXXXXXXXXXXXXXXX<
>   *
>   000800 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
>   *
>   001000

Yeah, sorry. I've got totally confused here. truncate_pagecache_range()
indeed does all the zeroing we need. Your version of ext4_truncate_folio()
should do the right thing.

> > If I'm right, I'd keep it simple and just writeout these partial folios with
> > filemap_write_and_wait_range() and expand the range
> > truncate_pagecache_range() removes to include these partial folios. The
> 
> What I mean is the truncate_pagecache_range() has already covered the partial
> folios. right?

Right, it should cover the partial folios.

> > overhead won't be big and it isn't like this is some very performance
> > sensitive path.
> > 
> >> +}
> >> +
> >> +/*
> >> + * When truncating a range of folios, if the block size is less than the
> >> + * page size, the file's mapped partial blocks within one page could be
> >> + * freed or converted to unwritten. We should call this function to remove
> >> + * writable userspace mappings so that ext4_page_mkwrite() can be called
> >> + * during subsequent write access to these folios.
> >> + */
> >> +void ext4_truncate_folios_range(struct inode *inode, loff_t start, loff_t end)
> > 
> > Maybe call this ext4_truncate_page_cache_block_range()? And assert that
> > start & end are block aligned. Because this essentially prepares page cache
> > for manipulation with a block range.
> 
> Ha, it's a good idea, I agree with you that move truncate_pagecache_range()
> and the hunk of flushing in journal data mode into this function. But I don't
> understand why assert that 'start & end' are block aligned?

Yes, that shouldn't be needed since truncate_pagecache_range() will do the
right thing.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR