On Wed, Nov 17, 2021 at 03:28:10PM +1100, NeilBrown wrote: > > Various places in the kernel - largely in filesystems - respond to a > memory allocation failure by looping around and re-trying. ..... > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index aca874d33fe6..f2f2a5b28808 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -214,6 +214,27 @@ static inline void fs_reclaim_acquire(gfp_t gfp_mask) { } > static inline void fs_reclaim_release(gfp_t gfp_mask) { } > #endif > > +/* Any memory-allocation retry loop should use > + * memalloc_retry_wait(), and pass the flags for the most > + * constrained allocation attempt that might have failed. > + * This provides useful documentation of where loops are, > + * and a central place to fine tune the waiting as the MM > + * implementation changes. > + */ > +static inline void memalloc_retry_wait(gfp_t gfp_flags) > +{ > + gfp_flags = current_gfp_context(gfp_flags); > + if ((gfp_flags & __GFP_DIRECT_RECLAIM) && > + !(gfp_flags & __GFP_NORETRY)) > + /* Probably waited already, no need for much more */ > + schedule_timeout_uninterruptible(1); > + else > + /* Probably didn't wait, and has now released a lock, > + * so now is a good time to wait > + */ > + schedule_timeout_uninterruptible(HZ/50); > +} The existing congestion_wait() calls io_schedule_timeout() under TASK_UNINTERRUPTIBLE conditions. Does changing all these calls just to a plain schedule_timeout_uninterruptible() make any difference to behaviour? At least process accounting will appear different (uninterruptible sleep instead of IO wait), and I suspect that the block plug flushing in io_schedule() might be a good idea to retain for all the filesystems that call this function from IO-related routines. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx