Michal Hocko wrote: > On Wed 11-01-17 19:55:20, Tetsuo Handa wrote: > > Currently, warn_alloc() prints warning messages only if __GFP_NOWARN > > is not specified. When warn_alloc() was proposed, I asserted that > > warn_alloc() should print stall warning messages even if __GFP_NOWARN > > is specified, but that assertion was not accepted [1]. > > > > Compared to asynchronous watchdog [2], warn_alloc() for reporting stalls > > is broken in many aspects. First of all, we can't guarantee forward > > progress of memory allocation request. It is important to understand that > > the reason is not limited to the "too small to fail" memory-allocation > > rule [3]. We need to learn that the caller may fail to call warn_alloc() > > from page allocator whereas warn_alloc() assumes that stalling threads > > can call warn_alloc() from page allocator. > > > > An easily reproducible situation is that kswapd is blocked on other > > threads doing memory allocations while other threads doing memory > > allocations are blocked on kswapd [4]. > > This all is unrelated to whether we should or shouldn't warn when > __GFP_NOWARN is specified. > > > But what is silly is that, even > > if some allocation request was lucky enough to escape from > > too_many_isolated() loop because it was GFP_NOIO or GFP_NOFS, it fails > > to print warning messages because it was __GFP_NOWARN when all other > > allocations were looping inside too_many_isolated() loop (an example [5] > > is shown below). We are needlessly discarding a chance to know that > > the system got livelocked. > > But the caller had some reason to not warn. So why should we ignore > that? The reason this flag is usually added is that the allocation > failure is tolerable and it shouldn't alarm the admin to do any action. Majority of __GFP_DIRECT_RECLAIM allocation requests are tolerable with allocation failure (and they will be willing to give up upon SIGKILL if they are from syscall) and do not need to alarm the admin to do any action. If they are not tolerable with allocation failure, they will add __GFP_NOFAIL. Apart from the reality that they are not tested well because they are currently protected by the "too small to fail" memory-allocation rule, they are ready to add __GFP_NOWARN. And current behavior (i.e. !costly __GFP_DIRECT_RECLAIM allocation requests won't fail unless __GFP_NORETRY is set or TIF_MEMDIE is set after SIGKILL was delivered) keeps them away from adding __GFP_NOFAIL. > > So rather than repeating why you think that warn_alloc is worse than a > different solution which you are trying to push through you should in > fact explain why we should handle stall and allocation failure warnings > differently and how are we going to handle potential future users who > would like to disable warning for both. Because once you change the > semantic we will have problems to change it like for other gfp flags. Oh, thank you very much for positive (or at least neutral) response to asynchronous watchdog. I don't mean to change the semantic of GFP flags if we can go with asynchronous watchdog. I'm posting this patch because there is no progress with asynchronous watchdog. I'm not sure what "why we should handle stall and allocation failure warnings differently" means. Which one did you mean? (a) "why we should handle stall warning by synchronous watchdog (e.g. warn_alloc()) and allocation failure warnings differently" (b) "why we should handle stall warning by asynchronous watchdog (e.g. kmallocwd) and allocation failure warnings differently" If you meant (a), it is because allocation livelock is a problem which current GFP flags semantics cannot handle. We had been considering only allocation failures. We have never considered allocation livelock which is observed as allocation stalls. (The allocation livelock after the OOM killer is invoked was solved by the OOM reaper. But I'm talking about allocation livelock before the OOM killer is invoked, and I don't think this problem can be solved within a few years because this problem is caused by optimistic direct reclaim. Thus, I'm proposing to start from checking whether this problem is occurring and providing diagnostic information.) __GFP_NOWARN considers only about allocation failures. It is legal to set both __GFP_NOFAIL and __GFP_NOWARN despite __GFP_NOWARN is pointless from the point of view of allocation failures because __GFP_NOFAIL never fails. But __GFP_NOWARN is an intruder from the point of view of reporting allocation livelock (regardless of whether __GFP_NOFAIL is also set). If you meant (b), it is because synchronous watchdog is not reliable and cannot provide enough diagnostic information. Since allocation livelock involves several threads due to dependency, it is important to take a snapshot of possibly relevant threads. By using asynchronous watchdog, we can not only take a snapshot but also take more actions for obtaining diagnostic information (e.g. enabling tracepoints when allocation stalls are detected). I'm not sure what "how are we going to handle potential future users who would like to disable warning for both" means. Which one did you mean? (c) "how are we going to handle potential future system administrators who would like to disable warning for both allocation failures and allocation stalls" (d) "how are we going to handle potential future kernel developers who would like to disable warning for both allocation failures and allocation stalls" If you meant (c), kmallocwd v6 allows system administrators to disable stall warnings by setting /proc/sys/kernel/memalloc_task_timeout_secs to 0. If you meant (d), I don't think we need to worry about it. Current warn_alloc() implementation for reporting allocation stalls (i.e. report unless __GFP_NOWARN is set) is already preventing system administrators and kernel developers from disabling allocation stall warnings. We will be able to get rid of warn_alloc() for reporting stalls and asynchronous watchdog when we managed to guarantee forward progress of memory allocation and prove that allocation livelock is impossible. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>