Re: [RFCv2 1/3] iomap: Move creation of iomap_page early in __iomap_write_begin

"Ritesh Harjani (IBM)" <ritesh.list@xxxxxxxxx> · Wed, 1 Feb 2023 01:30:20 +0530

On 23/01/31 06:48PM, Matthew Wilcox wrote:
> On Wed, Feb 01, 2023 at 12:07:25AM +0530, Ritesh Harjani (IBM) wrote:
> > On 23/01/30 09:00PM, Matthew Wilcox wrote:
> > > On Tue, Jan 31, 2023 at 01:51:50AM +0530, Ritesh Harjani (IBM) wrote:
> > > > > > Thus the iop structure will only gets allocated at the time of writeback
> > > > > > in iomap_writepage_map(). This I think, was a not problem till now since
> > > > > > we anyway only track uptodate status in iop (no support of tracking
> > > > > > dirty bitmap status which later patches will add), and we also end up
> > > > > > setting all the bits in iomap_page_create(), if the page is uptodate.
> > > > >
> > > > > delayed iop allocation is a feature and not a bug.  We might have to
> > > > > refine the criteria for sub-page dirty tracking, but in general having
> > > > > the iop allocates is a memory and performance overhead and should be
> > > > > avoided as much as possible.  In fact I still have some unfinished
> > > > > work to allocate it even more lazily.
> > > >
> > > > So, what I meant here was that the commit[1] chaged the behavior/functionality
> > > > without indenting to. I agree it's not a bug.
> > >
> > > It didn't change the behaviour or functionality.  It broke your patches,
> > > but it certainly doesn't deserve its own commit reverting it -- because
> > > it's not wrong.
> > >
> > > > But when I added dirty bitmap tracking support, I couldn't understand for
> > > > sometime on why were we allocating iop only at the time of writeback.
> > > > And it was due to a small line change which somehow slipped into this commit [1].
> > > > Hence I made this as a seperate patch so that it doesn't slip through again w/o
> > > > getting noticed/review.
> > >
> > > It didn't "slip through".  It was intended.
> > >
> > > > Thanks for the info on the lazy allocation work. Yes, though it is not a bug, but
> > > > with subpage dirty tracking in iop->state[], if we end up allocating iop only
> > > > at the time of writeback, than that might cause some performance degradation
> > > > compared to, if we allocat iop at ->write_begin() and mark the required dirty
> > > > bit ranges in ->write_end(). Like how we do in this patch series.
> > > > (Ofcourse it is true only for bs < ps use case).
> > > >
> > > > [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@xxxxxx/
> > >
> > > You absolutely can allocate it in iomap_write_begin, but you can avoid
> > > allocating it until writeback time if (pos, len) entirely overlap the
> > > folio.  ie:
> > >
> > > 	if (pos > folio_pos(folio) ||
> > > 	    pos + len < folio_pos(folio) + folio_size(folio))
> > > 		iop = iomap_page_create(iter->inode, folio, iter->flags, false);
> >
> > Thanks for the suggestion. However do you think it will be better if this is
> > introduced along with lazy allocation changes which Christoph was mentioning
> > about?
> > Why I am thinking that is because, with above approach we delay the allocation
> > of iop until writeback, for entire folio overlap case. But then later
> > in __iomap_write_begin(), we require iop if folio is not uptodate.
> > Hence we again will have to do some checks to see if the iop is not allocated
> > then allocate it (which is for entire folio overlap case).
> > That somehow looked like an overkill for a very little gain in the context of
> > this patch series. Kindly let me know your thoughts on this.
>
> Look at *why* __iomap_write_begin() allocates an iop.  It's to read in the
> blocks which are going to be partially-overwritten by the write.  If the
> write overlaps the entire folio, there are no parts which need to be read
> in, and we can simply return.

Yes that make sense.

> Maybe we should make that more obvious:

Yes, I think this maybe required. Because otherwise we might end up using
uninitialized iop. generic/156 (funshare), can easily trigger that.
Will spend sometime on the unshare path of iomap.

>
> 	if (folio_test_uptodate(folio))
> 		return 0;
> 	if (pos <= folio_pos(folio) &&
> 	    pos + len >= folio_pos(folio) + folio_size(folio))
> 		return 0;
> 	folio_clear_error(folio);
>
> (I think pos must always be >= folio_pos(), so that <= could be ==, but
> it doesn't hurt anything to use <=)

Thanks for sharing this.

-ritesh