Re: [PATCH] mm,writeback: Don't use memory reserves for wb_start_writeback

Michal Hocko <mhocko@xxxxxxxxxx> · Tue, 29 Mar 2016 10:54:35 +0200



[CCed Jack - Tetsuo it is preferable to CC people involved in the
previous discussion - and of course those who acked the patch as well]

On Thu 24-03-16 14:17:14, Andrew Morton wrote:
> On Thu, 24 Mar 2016 23:03:16 +0900 Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> 
> > Andrew, can you take this patch?
> 
> Tejun.
> 
> > ----------------------------------------
> > >From 5d43acbc5849a63494a732e39374692822145923 Mon Sep 17 00:00:00 2001
> > From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> > Date: Sun, 13 Mar 2016 23:03:05 +0900
> > Subject: [PATCH] mm,writeback: Don't use memory reserves for
> >  wb_start_writeback
> > 
> > When writeback operation cannot make forward progress because memory
> > allocation requests needed for doing I/O cannot be satisfied (e.g.
> > under OOM-livelock situation), we can observe flood of order-0 page
> > allocation failure messages caused by complete depletion of memory
> > reserves.
> > 
> > This is caused by unconditionally allocating "struct wb_writeback_work"
> > objects using GFP_ATOMIC from PF_MEMALLOC context.
> > 
> > __alloc_pages_nodemask() {
> >   __alloc_pages_slowpath() {
> >     __alloc_pages_direct_reclaim() {
> >       __perform_reclaim() {
> >         current->flags |= PF_MEMALLOC;
> >         try_to_free_pages() {
> >           do_try_to_free_pages() {
> >             wakeup_flusher_threads() {
> >               wb_start_writeback() {
> >                 kzalloc(sizeof(*work), GFP_ATOMIC) {
> >                   /* ALLOC_NO_WATERMARKS via PF_MEMALLOC */
> >                 }
> >               }
> >             }
> >           }
> >         }
> >         current->flags &= ~PF_MEMALLOC;
> >       }
> >     }
> >   }
> > }
> > 
> > Since I/O is stalling, allocating writeback requests forever shall deplete
> > memory reserves. Fortunately, since wb_start_writeback() can fall back to
> > wb_wakeup() when allocating "struct wb_writeback_work" failed, we don't
> > need to allow wb_start_writeback() to use memory reserves.
> > 
> > ...
> >
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -929,7 +929,8 @@ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
> >  	 * This is WB_SYNC_NONE writeback, so if allocation fails just
> >  	 * wakeup the thread for old dirty data writeback
> >  	 */
> > -	work = kzalloc(sizeof(*work), GFP_ATOMIC);
> > +	work = kzalloc(sizeof(*work),
> > +		       GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
> >  	if (!work) {
> >  		trace_writeback_nowork(wb);
> >  		wb_wakeup(wb);
> 
> Oh geeze.  fs/fs-writeback.c has grown waaay too many GFP_ATOMICs :(
> 
> How does this actually all work?

Jack has explained it a bit
http://lkml.kernel.org/r/20160318131136.GE7152@xxxxxxxxxxxxx

> afaict if we fail this
> wb_writeback_work allocation, wb_workfn->wb_do_writeback will later say
> "hey, there are no work items!" and will do nothing at all.  Or does
> wb_workfn() fall into write-1024-pages-anyway mode and if so, how did
> it know how to do that?
> 
> If we had (say) a mempool of wb_writeback_work's (at least for for
> wb_start_writeback), would that help anything?  Or would writeback
> simply fail shortly afterwards for other reasons?
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>