On Mon 29-07-24 08:04:19, Zhijian Li (Fujitsu) wrote: > On 29/07/2024 15:40, Michal Hocko wrote: > > That means that rather than killing the > > test program which continues consuming memory - and not much of it - it > > keeps killing other tasks with a higher memory consumption. > > This behavior is not my(administrator) expectation. Well, this lack of proper NUMA aware oom killer behavior is there since decades without many people complaining about that enough to push for a better implementation. So while this is not great it seems not that many people are suffering from that. In general dealing with a complete memory node hotremove while there are applications with strong numa policies is quite hard to do right and there will always be a certain level of suffering. > > This is really unfortunate but not something that should be handled by > > special casing memory offlining but rather handling the mempolicy OOMs > > better. There were some attempts in the past but never made it to a > > mergable state. Maybe you want to pick up on that. > > > Well, tell me the previous proposals(mail/url) please if you have the them in hand. > I want to take a look. https://lore.kernel.org/all/20220708082129.80115-1-ligang.bdlg@xxxxxxxxxxxxx/ btw. lore.kernel.org has a great searching engine. > >> [13853.758192] pagefault_out_of_memory: 4055 callbacks suppressed > >> [13853.758243] Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF > > > > This shouldn't really happen and it indicates that some memory > > allocation in the pagefault path has failed. > > May I know if this will cause side effect to other processes. This eill mean that the #PF handler has failed to allocate memory and the VM_FAULT_OOM error has unwound all the way up to the exception handler and that will restart the instruction that has caused the #PF. In essence, as long as the process triggering this is not killed or the allocation doesn't suceed it will be looping in the #PF path. This normally doesn't happen because our allocators do not fail for small allocation requests. -- Michal Hocko SUSE Labs