Hello, On Tue, Oct 08, 2019 at 04:23:22PM +0200, David Sterba wrote: > > 1. There is a single file which has accumulated enough dirty pages to > > trigger balance_dirty_pages() and the writer appending to the file > > with a series of short writes. > > > > 2. bdp kicks in, wakes up background writeback and sleeps. > > What does 'bdp' refer to? Oh, Sorry about that. balance_dirty_pages(). > > IOW, as long as it's not EOF and there's budget, the code never > > retries writing back the same page. Only when a page happens to be > > the last page of a particular run, we end up retrying the page, which > > can't possibly guarantee anything data integrity related. Besides, > > cyclic writes are only used for non-syncing writebacks meaning that > > there's no data integrity implication to begin with. > > The code was added in a91326679f2a0a4c239 ("Btrfs: make > mapping->writeback_index point to the last written page") after a user > report in https://www.spinics.net/lists/linux-btrfs/msg52628.html , slow > appends that caused fragmentation Ah, okay, that makes sense. That prolly warrants some comments. > What you describe as the cause is similar, but you're partially > reverting the fix that was supposed to fix it. As there's more code > added by the original patch, the revert won't probably bring back the > bug. > > The whole function and states are hard to follow, I agree with your > reasoning about the check being bogus and overall I'd rather see fewer > special cases in the function. > > Also the removed comment mentions media errors but this was not the > problem for the original report and is not a common scenario either. So > as long as the fallback in such case is sane (ie. set done = 1 and > exit), I don't see futher problems. > > > Fix it by always setting done_index past the current page being > > processed. > > > > Note that this problem exists in other writepages too. > > I can see that write_cache_pages does the same done_index updates. So > it makes sense that the page walking and writeback index tracking > behaviour is consistent, unless extent_write_cache_pages has diverged > too much. > > I'll add the patch to misc-next. Thanks. So, in case trying to write the last page is still needed, the thing necessary to fix this deadlock is avoiding repeating on the last page too many times / indefinitely as that's what ends up blocking forward progress - almost all of dirty data is behind the cyclic cursor and the cursor can't wrap back to get to them. Thanks. -- tejun