On Mon 08-04-19 10:53:34, Andreas Gruenbacher wrote: > On Sun, 7 Apr 2019 at 09:32, Christoph Hellwig <hch@xxxxxx> wrote: > > > > [adding Jan and linux-mm] > > > > On Fri, Mar 29, 2019 at 11:13:00PM +0100, Andreas Gruenbacher wrote: > > > > But what is the requirement to do this in writeback context? Can't > > > > we move it out into another context instead? > > > > > > Indeed, this isn't for data integrity in this case but because the > > > dirty limit is exceeded. What other context would you suggest to move > > > this to? > > > > > > (The iomap flag I've proposed would save us from getting into this > > > situation in the first place.) > > > > Your patch does two things: > > > > - it only calls balance_dirty_pages_ratelimited once per write > > operation instead of once per page. In the past btrfs did > > hacks like that, but IIRC they caused VM balancing issues. > > That is why everyone now calls balance_dirty_pages_ratelimited > > one per page. If calling it at a coarse granularity would > > be fine we should do it everywhere instead of just in gfs2 > > in journaled mode > > - it artifically reduces the size of writes to a low value, > > which I suspect is going to break real life application > > Not quite, balance_dirty_pages_ratelimited is called from iomap_end, > so once per iomap mapping returned, not per write. (The first version > of this patch got that wrong by accident, but not the second.) > > We can limit the size of the mappings returned just in that case. I'm > aware that there is a risk of balancing problems, I just don't have > any better ideas. > > This is a problem all filesystems with data-journaling will have with > iomap, it's not that gfs2 is doing anything particularly stupid. I agree that if ext4 would be using iomap, it would have similar issues. > > So I really think we need to fix this properly. And if that means > > that you can't make use of the iomap batching for gfs2 in journaled > > mode that is still a better option. > > That would mean using the old-style, page-size allocations, and a > completely separate write path in that case. That would be quite a > nightmare. > > > But I really think you need > > to look into the scope of your flush_log and figure out a good way > > to reduce that as solve the root cause. > > We won't be able to do a log flush while another transaction is > active, but that's what's needed to clean dirty pages. iomap doesn't > allow us to put the block allocation into a separate transaction from > the page writes; for that, the opposite to the page_done hook would > probably be needed. I agree that a ->page_prepare() hook would be probably the cleanest solution for this. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR