On Mon 14-09-20 09:57:02, Vijay Balakrishna wrote: > > > On 9/14/2020 7:33 AM, Michal Hocko wrote: > > On Thu 10-09-20 13:47:39, Vijay Balakrishna wrote: > > > When memory is hotplug added or removed the min_free_kbytes must be > > > recalculated based on what is expected by khugepaged. Currently > > > after hotplug, min_free_kbytes will be set to a lower default and higher > > > default set when THP enabled is lost. This leaves the system with small > > > min_free_kbytes which isn't suitable for systems especially with network > > > intensive loads. Typical failure symptoms include HW WATCHDOG reset, > > > soft lockup hang notices, NETDEVICE WATCHDOG timeouts, and OOM process > > > kills. > > > > Care to explain some more please? The whole point of increasing > > min_free_kbytes for THP is to get a larger free memory with a hope that > > huge pages will be more likely to appear. While this might help for > > other users that need a high order pages it is definitely not the > > primary reason behind it. Could you provide an example with some more > > data? > > Thanks Michal. I haven't looked into THP as part of my investigation, so I > cannot comment. > > In our use case we are hotplug removing ~2GB of 8GB total (on our SoC) > during normal reboot/shutdown. This memory is hotplug hot-added as movable > type via systemd late service during start-of-day. > > In our stress test first we ran into HW WATCHDOG recovery, on enabling > kernel watchdog we started seeing soft lockup hung task notices, failure > symptons varied, where stack trace of hung tasks sometimes trying to > allocate GFP_ATOMIC memory, looping in do_notify_resume, NETDEVICE WATCHDOG > timeouts, OOM process kills etc., During investigation we reran stress test > without hotplug use case. Surprisingly this run didn't encounter the said > problems. This led to comparing what is different between the two runs, > while looking at various globals, studying hotplug code I uncovered the > issue of failing to restore min_free_kbytes. In particular on our 8GB SoC > min_free_kbytes went down to 8703 from 22528 after hotplug add. Did you try to increase min_free_kbytes manually after hot remove? Btw. I would consider oom killer invocation due to min_free_kbytes really weird behavior. If anything the higher value would cause more memory reclaim and potentially oom rather than smaller one. -- Michal Hocko SUSE Labs