Re: [PATCH 2/2] iomap: Support inline data with block size < page size

Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> · Sat, 24 Jul 2021 23:14:22 +0800

On Sat, Jul 24, 2021 at 03:03:56PM +0100, Matthew Wilcox wrote:
> On Sat, Jul 24, 2021 at 12:46:45PM +0800, Gao Xiang wrote:
> > Hi Matthew,
> > 
> > On Sat, Jul 24, 2021 at 04:44:35AM +0100, Matthew Wilcox (Oracle) wrote:
> > > Remove the restriction that inline data must start on a page boundary
> > > in a file.  This allows, for example, the first 2KiB to be stored out
> > > of line and the trailing 30 bytes to be stored inline.
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> > > ---
> > >  fs/iomap/buffered-io.c | 18 ++++++++----------
> > >  1 file changed, 8 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > index 7bd8e5de996d..d7d6af29af7f 100644
> > > --- a/fs/iomap/buffered-io.c
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -209,25 +209,23 @@ static int iomap_read_inline_data(struct inode *inode, struct page *page,
> > >  		struct iomap *iomap, loff_t pos)
> > >  {
> > >  	size_t size = iomap->length + iomap->offset - pos;
> > > +	size_t poff = offset_in_page(pos);
> > >  	void *addr;
> > >  
> > >  	if (PageUptodate(page))
> > > -		return PAGE_SIZE;
> > > +		return PAGE_SIZE - poff;
> > >  
> > > -	/* inline data must start page aligned in the file */
> > > -	if (WARN_ON_ONCE(offset_in_page(pos)))
> > > -		return -EIO;
> > >  	if (WARN_ON_ONCE(!iomap_inline_data_size_valid(iomap)))
> > >  		return -EIO;
> > > -	if (WARN_ON_ONCE(page_has_private(page)))
> > > -		return -EIO;
> > > +	if (poff > 0)
> > > +		iomap_page_create(inode, page);
> > 
> > Thanks for the patch!
> > 
> > Previously I'd like to skip the leading uptodate blocks and update the
> > extent it covers that is due to already exist iop. If we get rid of the
> > offset_in_page(pos) restriction like this, I'm not sure if we (or some
> > other fs users) could face something like below (due to somewhat buggy
> > fs users likewise):
> > 
> >  [0 - 4096)    plain block
> > 
> >  [4096 - 4608)  tail INLINE 1 (e.g. by mistake or just splitted
> >                                     .iomap_begin() report.)
> >  [4608 - 5120]  tail INLINE 2
> 
> My assumption is that an INLINE extent is <= block_size.  Otherwise
> the first part of the extent would be not-inline.  So this would be
> a bug in the filesystem; [4096-4608) should not be an inline extent.

Yes, never mind. Sorry about again.

> 
> > with this code iomap_set_range_uptodate() wouldn't behave correctly.
> > 
> > In addition, currently EROFS cannot test such path (since EROFS is
> > page-sized block only for now) as Darrick said in the previous reply,
> > so I think it would be better together with the folio patchset and
> > targeted for the corresponding merge window, so I can test iomap
> > supported EROFS with the new folio support together (That also give
> > me some time to support sub-pagesized uncompressed blocks...)
> 
> Do you want to test erofs with multi-page folios?  That might be
> even more interesting than block size < page size.

Hmm.. I'm busy in developing for some new scenario. Will look into
that after the current busy period.

> 
> > > -	addr = kmap_atomic(page);
> > > +	addr = kmap_atomic(page) + poff;
> > >  	memcpy(addr, iomap_inline_buf(iomap, pos), size);
> > > -	memset(addr + size, 0, PAGE_SIZE - size);
> > > +	memset(addr + size, 0, PAGE_SIZE - poff - size);
> > >  	kunmap_atomic(addr);
> > 
> > As my limited understanding, this may need to be fixed, since it
> > doesn't match kmap_atomic(page)...
> 
> void kunmap_local_indexed(void *vaddr)
> {
>         unsigned long addr = (unsigned long) vaddr & PAGE_MASK;
> 
> so it's fine to unmap any address in the page.

I already checked this (in practice it has no problem due to the
current implementation), yet I'm not quite sure if it matches the
API usage, and not quite sure how many in-kernel users use like this.

Thanks,
Gao Xiang