On Fri 15-06-18 08:47:52, Mikulas Patocka wrote: > > > On Fri, 15 Jun 2018, Michal Hocko wrote: > > > On Fri 15-06-18 07:35:07, Mikulas Patocka wrote: > > > > > > Because mempool uses it. Mempool uses allocations with "GFP_NOIO | > > > __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN". An so dm-bufio uses > > > these flags too. dm-bufio is just a big mempool. > > > > This doesn't answer my question though. Somebody else is doing it is not > > an explanation. Prior to your 41c73a49df31 there was no GFP_NOIO > > allocation AFAICS. So why do you really need it now? Why cannot you > > dm-bufio always used "GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | > __GFP_NOWARN" since the kernel 3.2 when it was introduced. > > In the kernel 4.10, dm-bufio was changed so that it does GFP_NOWAIT > allocation, then drops the lock and does GFP_NOIO with the dropped lock > (because someone was likely experiencing the same issue that is reported > in this thread) - there are two commits that change it - 9ea61cac0 and > 41c73a49df31. OK, I see. Is there any fundamental reason why this path has to do one round of GFP_IO or it can keep NOWAIT, drop the lock, sleep and retry again? [...] > > is the same class of problem, honestly, I dunno. And I've already said > > that stalling __GFP_NORETRY might be a good way around that but that > > needs much more consideration and existing users examination. I am not > > aware anybody has done that. Doing changes like that based on a single > > user is certainly risky. > > Why don't you set any rules how these flags should be used? It is really hard to change rules during the game. You basically have to examine all existing users and that is well beyond my time scope. I've tried that where it was possible. E.g. __GFP_REPEAT and turned it into a well defined semantic. __GFP_NORETRY is a bigger beast. Anyway, I believe that it would be much safer to look at the problem from a highlevel perspective. You seem to be focused on __GFP_NORETRY little bit too much IMHO. We are not throttling callers which explicitly do not want to or cannot - see current_may_throttle. Is it possible that both your md and mempool allocators can either (ab)use PF_LESS_THROTTLE or use other means? E.g. do you have backing_dev_info at that time? -- Michal Hocko SUSE Labs