Re: gfs2 iomap dealock, IOMAP_F_UNBALANCED

Andreas Gruenbacher <agruenba@xxxxxxxxxx> · Mon, 8 Apr 2019 10:53:34 +0200

On Sun, 7 Apr 2019 at 09:32, Christoph Hellwig <hch@xxxxxx> wrote:
>
> [adding Jan and linux-mm]
>
> On Fri, Mar 29, 2019 at 11:13:00PM +0100, Andreas Gruenbacher wrote:
> > > But what is the requirement to do this in writeback context?  Can't
> > > we move it out into another context instead?
> >
> > Indeed, this isn't for data integrity in this case but because the
> > dirty limit is exceeded. What other context would you suggest to move
> > this to?
> >
> > (The iomap flag I've proposed would save us from getting into this
> > situation in the first place.)
>
> Your patch does two things:
>
>  - it only calls balance_dirty_pages_ratelimited once per write
>    operation instead of once per page.  In the past btrfs did
>    hacks like that, but IIRC they caused VM balancing issues.
>    That is why everyone now calls balance_dirty_pages_ratelimited
>    one per page.  If calling it at a coarse granularity would
>    be fine we should do it everywhere instead of just in gfs2
>    in journaled mode
>  - it artifically reduces the size of writes to a low value,
>    which I suspect is going to break real life application

Not quite, balance_dirty_pages_ratelimited is called from iomap_end,
so once per iomap mapping returned, not per write. (The first version
of this patch got that wrong by accident, but not the second.)

We can limit the size of the mappings returned just in that case. I'm
aware that there is a risk of balancing problems, I just don't have
any better ideas.

This is a problem all filesystems with data-journaling will have with
iomap, it's not that gfs2 is doing anything particularly stupid.

> So I really think we need to fix this properly.  And if that means
> that you can't make use of the iomap batching for gfs2 in journaled
> mode that is still a better option.

That would mean using the old-style, page-size allocations, and a
completely separate write path in that case. That would be quite a
nightmare.

> But I really think you need
> to look into the scope of your flush_log and figure out a good way
> to reduce that as solve the root cause.

We won't be able to do a log flush while another transaction is
active, but that's what's needed to clean dirty pages. iomap doesn't
allow us to put the block allocation into a separate transaction from
the page writes; for that, the opposite to the page_done hook would
probably be needed.

Thanks,
Andreas