Re: [RFC PATCH 1/1] mm/hugetlb mm/oom_kill: Add support for reclaiming hugepages on OOM events.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 31-07-17 21:11:25, Liam R. Howlett wrote:
> * Michal Hocko <mhocko@xxxxxxxxxx> [170731 10:08]:
> > On Mon 31-07-17 09:56:48, Liam R. Howlett wrote:
[...]
> > > No,  I'm talking about failed memory for whatever reason.  The system
> > > reboots by a hardware means (I believe the memory controller) and
> > > removes the memory on that failed module from the pool.  Now you
> > > effectively have a system with less memory than before which invalidates
> > > your configuration.  Is it worth while to have Linux successfully boot
> > > when the system attempts to recover from a failure?
> > 
> > Cetainly yes but if you boot with much less memory and you want to use
> > hugetlb pages then you have to reconsider and maybe even reconfigure
> > your workload to reflect new conditions. So I am not really sure this
> > can be fully automated.
> > 
> 
> I agree.  A reconfiguration or repair is required to have optimum
> performance.  Would you agree that having functioning system better than
> a reboot loop or hang on a panic?  It's also easier to reconfigure a
> system that's booting.

Absolutely. The thing is that I am not even sure that the hugetlb
problem is real. Using hugetlb reservation from the boot command line
parameter is easily fixable (just update the boot comand line from the
boot loader). From my experience the init time hugetlb initialization
is usually trying to be portable and as such configures a certain
percentage of the available memory for hugetlb (some of them even on per
NUMA node basis). Even if somebody uses hard coded values then this is
something that is fixable during recovery.

That being said I am not sure you are focusing on a real problem while
the solution you are proposing might break an existing userspace. Please
try to play with your memory recovery feature some more with real
hugetlb usecases (Oracle DB is a heavy user AFAIR) and see what the real
life problems might happen and we can revisit potential solutions with
more data in hands.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux