Michal Hocko wrote: > On Fri 13-01-17 20:00:11, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > So rather than repeating why you think that warn_alloc is worse than a > > > different solution which you are trying to push through you should in > > > fact explain why we should handle stall and allocation failure warnings > > > differently and how are we going to handle potential future users who > > > would like to disable warning for both. Because once you change the > > > semantic we will have problems to change it like for other gfp flags. > > > > Oh, thank you very much for positive (or at least neutral) response to > > asynchronous watchdog. I don't mean to change the semantic of GFP flags > > if we can go with asynchronous watchdog. I'm posting this patch because > > there is no progress with asynchronous watchdog. > > > > I'm not sure what "why we should handle stall and allocation failure > > warnings differently" means. Which one did you mean? > > > > (a) "why we should handle stall warning by synchronous watchdog > > (e.g. warn_alloc()) and allocation failure warnings differently" > > > > (b) "why we should handle stall warning by asynchronous watchdog > > (e.g. kmallocwd) and allocation failure warnings differently" > > > > If you meant (a), it is because allocation livelock is a problem which > > current GFP flags semantics cannot handle. We had been considering only > > allocation failures. We have never considered allocation livelock which > > is observed as allocation stalls. (The allocation livelock after the OOM > > killer is invoked was solved by the OOM reaper. But I'm talking about > > allocation livelock before the OOM killer is invoked, > > I am not going to allow defining a weird __GFP_NOWARN semantic which > allows warnings but only sometimes. At least not without having a proper > way to silence both failures _and_ stalls or just stalls. I do not > really thing this is worth the additional gfp flag. > > > and I don't think > > this problem can be solved within a few years because this problem is > > caused by optimistic direct reclaim. > > And again your are trying to define a weird semantic just because the > original problem seems too hard. This is a really wrong way to do > the development. And again the oom repear should serve you as an example > that things can be done _properly_ rather than tweaked around with > "sometimes works but not always" solutions. > > I plan to address the too_many_isolated problem. In fact I already have > some preliminary work done which I plan to post next week. An unbound > loop inside the reclaim is certainly something to get rid of and AFAIK > this is the only problem which can prevent reasonable return to the page > allocator. Sigh. You are again looking at only bugs which are reported. If I care only too_many_isolated() case, I don't need to propose asynchronous watchdog. Since I believe that there are bugs which averaged administrator cannot afford reporting, I'm proposing asynchronous watchdog for automatic reporting. Tetsuo Handa wrote at http://lkml.kernel.org/r/201612282042.GDB17129.tOHFOFSQOFLVJM@xxxxxxxxxxxxxxxxxxx : > > There has never been a disagreement here. The point we seem to be > > disagreeing is how much those issues you are seeing matter. I do not > > consider them top priority because they are not happening in real life > > enough. > > There is no evidence to prove "they are not happening in real life enough", for > there is no catch-all reporting mechanism. I consider that offering a mean to > find and report problems is top priority as a troubleshooting staff. I repeat: "AFAIK this is the only problem" is not acceptable. Andrew, what do you think? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>