Re: [PATCH 23/24] iomap: add support for sub-pagesize buffered I/O without buffer heads

Brian Foster <bfoster@xxxxxxxxxx> · Wed, 20 Jun 2018 10:32:53 -0400

Sending again without the attachment... Christoph, let me know if it
didn't hit your mbox at least.

On Wed, Jun 20, 2018 at 09:56:55AM +0200, Christoph Hellwig wrote:
> On Tue, Jun 19, 2018 at 12:52:11PM -0400, Brian Foster wrote:
> > > +	/*
> > > +	 * Move the caller beyond our range so that it keeps making progress.
> > > +	 * For that we have to include any leading non-uptodate ranges, but
> > 
> > Do you mean "leading uptodate ranges" here? E.g., pos is pushed forward
> > past those ranges we don't have to read, so (pos - orig_pos) reflects
> > the initial uptodate range while plen reflects the length we have to
> > read..?
> 
> Yes.
> 
> > > +
> > > +	do {
> > 
> > Kind of a nit, but this catches my eye and manages to confuse me every
> > time I look at it. A comment along the lines of:
> > 
> >                 /*
> > 		 * Pass in the block aligned start/end so we get back block
> > 		 * aligned/adjusted poff/plen and can compare with unaligned
> > 		 * from/to below.
> >                  */
> > 
> > ... would be nice here, IMO.
> 
> Fine with me.
> 
> > > +		iomap_adjust_read_range(inode, iop, &block_start,
> > > +				block_end - block_start, &poff, &plen);
> > > +		if (plen == 0)
> > > +			break;
> > > +
> > > +		if ((from > poff && from < poff + plen) ||
> > > +		    (to > poff && to < poff + plen)) {
> > > +			status = iomap_read_page_sync(inode, block_start, page,
> > > +					poff, plen, from, to, iomap);
> > 
> > After taking another look at the buffer head path, it does look like we
> > have slightly different behavior here. IIUC, the former reads only the
> > !uptodate blocks that fall along the from/to boundaries. Here, if say
> > from = 1, to = PAGE_SIZE and the page is fully !uptodate, it looks like
> > we'd read the entire page worth of blocks (assuming contiguous 512b
> > blocks, for example). Intentional? Doesn't seem like a big deal, but
> > could be worth a followup fix.
> 
> It wasn't actuall intentional, but I actually think it is the right thing
> in then end, as it means we'll often do a single read instead of two
> separate ones.

Ok, but if that's the argument, then shouldn't we not be doing two
separate I/Os if the middle range of a write happens to be already
uptodate? Or more for that matter, if the page happens to be sparsely
uptodate for whatever reason..?

OTOH, I also do wonder a bit whether that may always be the right thing
if we consider cases like 64k page size arches and whatnot. It seems
like we could end up consuming more bandwidth for reads than we
typically have in the past. That said, unless there's a functional
reason to change this I think it's fine to optimize this path for these
kinds of corner cases in follow on patches.

Finally, this survived xfstests on a sub-page block size fs but I
managed to hit an fsx error:

Mapped Read: non-zero data past EOF (0x21a1f) page offset 0xc00 is
0xc769

It repeats 100% of the time for me using the attached fsxops file (with
--replay-ops) on XFS w/ -bsize=1k. It doesn't occur without the final
patch to enable sub-page block iomap on XFS.

Brian

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html