Re: [PATCH 0/12 v3] Writeback improvements

Jens Axboe <axboe@xxxxxxxxx> · Thu, 28 Sep 2017 15:39:49 +0200

On 09/28/2017 03:19 PM, John Stoffel wrote:
> On Wed, Sep 27, 2017 at 02:13:47PM -0600, Jens Axboe wrote:
>> We've had some issues with writeback in presence of memory reclaim
>> at Facebook, and this patch set attempts to fix it up. The real
>> functional change for that issue is patch 10. The rest are cleanups,
>> as well as the removal of doing non-range cyclic writeback. The users
>> of that was sync_inodes_sb() and wakeup_flusher_threads(), both of
>> which writeback all of the dirty pages.
> 
> So does this patch set make things faster?  Less bursty?  Does it make
> writeout take longer, but with less spikes?  What is the performance
> impact of this change?   I hate to be a pain, but this just smacks of
> arm waving and I'm sure FB doesn't make changes without data... :-)

See patch 10, this isn't arm waving at all. The whole point is that
you can have millions of writeback work items, which don't do anything.
See not only are we wasting a full core of doing nothing, it's bad
enough that we can trigger softlockups since it's just sitting there
in a loop doing that.

It's all explained in that patch...

>> The basic idea is that we have callers that call
>> wakeup_flusher_threads() with nr_pages == 0. This means 'writeback
>> everything'. For memory reclaim situations, we can end up queuing
>> a TON of these kinds of writeback units. This can cause softlockups
>> and further memory issues, since we allocate huge amounts of
>> struct wb_writeback_work to handle this writeback. Handle this
>> situation more gracefully.
> 
> Do you push back on the callers or slow them down?  Why do we even
> allow callers to flush everything?

Ehm, because we have to? There are cases where flushing everything
makes sense. Laptop mode is one of them, the problematic case
here is memory reclaim. To clean dirty pages, you have to kick
the flusher threads.

-- 
Jens Axboe