On Fri, 22 Jun 2018, Michal Hocko wrote: > On Thu 21-06-18 21:17:24, Mikulas Patocka wrote: > [...] > > > But seriously, isn't the best way around the throttling issue to use > > > PF_LESS_THROTTLE? > > > > Yes - it could be done by setting PF_LESS_THROTTLE. But I think it would > > be better to change it just in one place than to add PF_LESS_THROTTLE to > > every block device driver (because adding it to every block driver results > > in more code). > > Why would every block device need this? I thought we were talking about > mempool allocator and the md variant of it. They are explicitly doing > their own back off so PF_LESS_THROTTLE sounds appropriate to me. Because every block driver is suspicible to this problem. Two years ago, there was a bug that when the user was swapping to dm-crypt device, memory management would stall the allocations done by dm-crypt by 100ms - that slowed down swapping rate and made the machine unuseable. Then, people are complaining about the same problem in dm-bufio. Next time, it will be something else. (you will answer : "we will not fix bugs unless people are reporting them" :-) > > What about this patch? If __GFP_NORETRY and __GFP_FS is not set (i.e. the > > request comes from a block device driver or a filesystem), we should not > > sleep. > > Why? How are you going to audit all the callers that the behavior makes > sense and moreover how are you going to ensure that future usage will > still make sense. The more subtle side effects gfp flags have the harder > they are to maintain. I did audit them - see the previous email in this thread: https://www.redhat.com/archives/dm-devel/2018-June/thread.html Mikulas > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > Cc: stable@xxxxxxxxxxxxxxx > > > > Index: linux-2.6/mm/vmscan.c > > =================================================================== > > --- linux-2.6.orig/mm/vmscan.c > > +++ linux-2.6/mm/vmscan.c > > @@ -2674,6 +2674,7 @@ static bool shrink_node(pg_data_t *pgdat > > * the LRU too quickly. > > */ > > if (!sc->hibernation_mode && !current_is_kswapd() && > > + (sc->gfp_mask & (__GFP_NORETRY | __GFP_FS)) != __GFP_NORETRY && > > current_may_throttle() && pgdat_memcg_congested(pgdat, root)) > > wait_iff_congested(BLK_RW_ASYNC, HZ/10); > > > > -- > Michal Hocko > SUSE Labs >