On 29/07/2024 16:15, Michal Hocko wrote: > On Mon 29-07-24 08:04:19, Zhijian Li (Fujitsu) wrote: >> On 29/07/2024 15:40, Michal Hocko wrote: >>> That means that rather than killing the >>> test program which continues consuming memory - and not much of it - it >>> keeps killing other tasks with a higher memory consumption. >> >> This behavior is not my(administrator) expectation. > > Well, this lack of proper NUMA aware oom killer behavior is there since > decades without many people complaining about that enough to push for a > better implementation. So while this is not great it seems not that many > people are suffering from that. > > In general dealing with a complete memory node hotremove while there are > applications with strong numa policies is quite hard to do right and > there will always be a certain level of suffering. Thank you very much for your explanation. Let me rethink it again... > >>> This is really unfortunate but not something that should be handled by >>> special casing memory offlining but rather handling the mempolicy OOMs >>> better. There were some attempts in the past but never made it to a >>> mergable state. Maybe you want to pick up on that. >> >> >> Well, tell me the previous proposals(mail/url) please if you have the them in hand. >> I want to take a look. > > https://lore.kernel.org/all/20220708082129.80115-1-ligang.bdlg@xxxxxxxxxxxxx/ > > btw. lore.kernel.org has a great searching engine. I will take a look later. > >>>> [13853.758192] pagefault_out_of_memory: 4055 callbacks suppressed >>>> [13853.758243] Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF >>> >>> This shouldn't really happen and it indicates that some memory >>> allocation in the pagefault path has failed. >> >> May I know if this will cause side effect to other processes. > > This eill mean that the #PF handler has failed to allocate memory and > the VM_FAULT_OOM error has unwound all the way up to the exception > handler and that will restart the instruction that has caused the #PF. > > In essence, as long as the process triggering this is not killed or the > allocation doesn't suceed it will be looping in the #PF path. This > normally doesn't happen because our allocators do not fail for small > allocation requests. Thanks again for your detailed explanation. I think this is acceptable for the process bound to the being removed node, isn't it?