Re: [PATCH] mm/hugetlb: Warn the user when issues arise on boot due to hugepages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Michal Hocko <mhocko@xxxxxxxx> [170606 02:01]:
> On Tue 06-06-17 07:49:17, Michal Hocko wrote:
> > On Mon 05-06-17 11:15:41, Liam R. Howlett wrote:
> > > * Michal Hocko <mhocko@xxxxxxxx> [170605 00:57]:
> > > > On Fri 02-06-17 20:54:13, Liam R. Howlett wrote:
> > > > > When the user specifies too many hugepages or an invalid
> > > > > default_hugepagesz the communication to the user is implicit in the
> > > > > allocation message.  This patch adds a warning when the desired page
> > > > > count is not allocated and prints an error when the default_hugepagesz
> > > > > is invalid on boot.
> > > > 
> > > > We do not warn when doing echo $NUM > nr_hugepages, so why should we
> > > > behave any different during the boot?
> > > 
> > > During boot hugepages will allocate until there is a fraction of the
> > > hugepage size left.  That is, we allocate until either the request is
> > > satisfied or memory for the pages is exhausted.  When memory for the
> > > pages is exhausted, it will most likely lead to the system failing with
> > > the OOM manager not finding enough (or anything) to kill (unless you're
> > > using really big hugepages in the order of 100s of MB or in the GBs).
> > > The user will most likely see the OOM messages much later in the boot
> > > sequence than the implicitly stated message.  Worse yet, you may even
> > > get an OOM for each processor which causes many pages of OOMs on modern
> > > systems.  Although these messages will be printed earlier than the OOM
> > > messages, at least giving the user errors and warnings will highlight
> > > the configuration as an issue.  I'm trying to point the user in the
> > > right direction by providing a more robust statement of what is failing.
> > 
> > Well, an oom report will tell us how much memory is eaten by hugetlb so
> > you would get a clue that something is misconfigured.

Absolutely, however this is again implicitly telling the user why the
system is failing to boot.  A lot of time may be - and has been - spent
finding what went wrong, and by multiple users.

> 
> And just to be more clear. I do not _object_ to the warning I just
> _think_ it is not very useful actually. If somebody misconfigure so
> badly that hugetlb allocations fail during the boot then it will be
> very likely visible. But if somebody misconfigures slightly less to not
> fail the system is very likely to not work properly and there will be no
> warning that this might be the source of problems. So is it worth adding
> more code with that limited usefulness?

I think telling the user that something failed is very useful.  This
obviously does not cover off all failure cases as you have pointed out,
but it is certainly better than silently continuing as is the case
today.

Are you suggesting that the error message be provided if the failure
happens after boot as well?

Thanks,
Liam

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux