On Thu 17-03-16 20:35:23, Tetsuo Handa wrote: [...] > But what I felt strange is what should_reclaim_retry() is doing. > > Michal Hocko wrote: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index f77e283fb8c6..b2de8c8761ad 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -3044,8 +3045,37 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, > > */ > > if (__zone_watermark_ok(zone, order, min_wmark_pages(zone), > > ac->high_zoneidx, alloc_flags, available)) { > > - /* Wait for some write requests to complete then retry */ > > - wait_iff_congested(zone, BLK_RW_ASYNC, HZ/50); > > + unsigned long writeback; > > + unsigned long dirty; > > + > > + writeback = zone_page_state_snapshot(zone, NR_WRITEBACK); > > + dirty = zone_page_state_snapshot(zone, NR_FILE_DIRTY); > > + > > + /* > > + * If we didn't make any progress and have a lot of > > + * dirty + writeback pages then we should wait for > > + * an IO to complete to slow down the reclaim and > > + * prevent from pre mature OOM > > + */ > > + if (!did_some_progress && 2*(writeback + dirty) > reclaimable) { > > + congestion_wait(BLK_RW_ASYNC, HZ/10); > > + return true; > > + } > > writeback and dirty are used only when did_some_progress == 0. Thus, we don't > need to calculate writeback and dirty using zone_page_state_snapshot() unless > did_some_progress == 0. OK, I will move this into if !did_some_progress. > But, does it make sense to take writeback and dirty into account when > disk_events_workfn (trace shown above) is doing GFP_NOIO allocation and > wb_workfn (trace shown above) is doing (presumably) GFP_NOFS allocation? > Shouldn't we use different threshold for GFP_NOIO / GFP_NOFS / GFP_KERNEL? I have considered skiping the throttling part for GFP_NOFS/GFP_NOIO previously but I couldn't have convinced myself it would make any difference. We know there was no progress in the reclaim and even if the current context is doing FS/IO allocation potentially then it obviously cannot get its memory so it cannot proceed. So now we are in the state where we either busy loop or sleep for a while. So I ended up not complicating the code even more. If you have a use case where busy waiting makes a difference then I would vote for a separate patch with a clear description. > > + > > + /* > > + * Memory allocation/reclaim might be called from a WQ > > + * context and the current implementation of the WQ > > + * concurrency control doesn't recognize that > > + * a particular WQ is congested if the worker thread is > > + * looping without ever sleeping. Therefore we have to > > + * do a short sleep here rather than calling > > + * cond_resched(). > > + */ > > + if (current->flags & PF_WQ_WORKER) > > + schedule_timeout(1); > > This schedule_timeout(1) does not sleep. You lost the fix as of next-20160317. > Please update. Yeah, I have that updated in my local patch already. Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>