Re: [RFCv2 1/3] iomap: Move creation of iomap_page early in __iomap_write_begin

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Tue, 31 Jan 2023 18:48:38 +0000

On Wed, Feb 01, 2023 at 12:07:25AM +0530, Ritesh Harjani (IBM) wrote:
> On 23/01/30 09:00PM, Matthew Wilcox wrote:
> > On Tue, Jan 31, 2023 at 01:51:50AM +0530, Ritesh Harjani (IBM) wrote:
> > > > > Thus the iop structure will only gets allocated at the time of writeback
> > > > > in iomap_writepage_map(). This I think, was a not problem till now since
> > > > > we anyway only track uptodate status in iop (no support of tracking
> > > > > dirty bitmap status which later patches will add), and we also end up
> > > > > setting all the bits in iomap_page_create(), if the page is uptodate.
> > > >
> > > > delayed iop allocation is a feature and not a bug.  We might have to
> > > > refine the criteria for sub-page dirty tracking, but in general having
> > > > the iop allocates is a memory and performance overhead and should be
> > > > avoided as much as possible.  In fact I still have some unfinished
> > > > work to allocate it even more lazily.
> > >
> > > So, what I meant here was that the commit[1] chaged the behavior/functionality
> > > without indenting to. I agree it's not a bug.
> >
> > It didn't change the behaviour or functionality.  It broke your patches,
> > but it certainly doesn't deserve its own commit reverting it -- because
> > it's not wrong.
> >
> > > But when I added dirty bitmap tracking support, I couldn't understand for
> > > sometime on why were we allocating iop only at the time of writeback.
> > > And it was due to a small line change which somehow slipped into this commit [1].
> > > Hence I made this as a seperate patch so that it doesn't slip through again w/o
> > > getting noticed/review.
> >
> > It didn't "slip through".  It was intended.
> >
> > > Thanks for the info on the lazy allocation work. Yes, though it is not a bug, but
> > > with subpage dirty tracking in iop->state[], if we end up allocating iop only
> > > at the time of writeback, than that might cause some performance degradation
> > > compared to, if we allocat iop at ->write_begin() and mark the required dirty
> > > bit ranges in ->write_end(). Like how we do in this patch series.
> > > (Ofcourse it is true only for bs < ps use case).
> > >
> > > [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@xxxxxx/
> >
> > You absolutely can allocate it in iomap_write_begin, but you can avoid
> > allocating it until writeback time if (pos, len) entirely overlap the
> > folio.  ie:
> >
> > 	if (pos > folio_pos(folio) ||
> > 	    pos + len < folio_pos(folio) + folio_size(folio))
> > 		iop = iomap_page_create(iter->inode, folio, iter->flags, false);
> 
> Thanks for the suggestion. However do you think it will be better if this is
> introduced along with lazy allocation changes which Christoph was mentioning
> about?
> Why I am thinking that is because, with above approach we delay the allocation
> of iop until writeback, for entire folio overlap case. But then later
> in __iomap_write_begin(), we require iop if folio is not uptodate.
> Hence we again will have to do some checks to see if the iop is not allocated
> then allocate it (which is for entire folio overlap case).
> That somehow looked like an overkill for a very little gain in the context of
> this patch series. Kindly let me know your thoughts on this.

Look at *why* __iomap_write_begin() allocates an iop.  It's to read in the
blocks which are going to be partially-overwritten by the write.  If the
write overlaps the entire folio, there are no parts which need to be read
in, and we can simply return.  Maybe we should make that more obvious:

	if (folio_test_uptodate(folio))
		return 0;
	if (pos <= folio_pos(folio) &&
	    pos + len >= folio_pos(folio) + folio_size(folio))
		return 0;
	folio_clear_error(folio);

(I think pos must always be >= folio_pos(), so that <= could be ==, but
it doesn't hurt anything to use <=)