Re: [PATCH] mm/hugetlb: Warn the user when issues arise on boot due to hugepages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Michal Hocko <mhocko@xxxxxxxx> [170612 13:49]:
> On Mon 12-06-17 13:28:30, Liam R. Howlett wrote:
> > * Michal Hocko <mhocko@xxxxxxxx> [170606 02:01]:
> [..]
> > > And just to be more clear. I do not _object_ to the warning I just
> > > _think_ it is not very useful actually. If somebody misconfigure so
> > > badly that hugetlb allocations fail during the boot then it will be
> > > very likely visible. But if somebody misconfigures slightly less to not
> > > fail the system is very likely to not work properly and there will be no
> > > warning that this might be the source of problems. So is it worth adding
> > > more code with that limited usefulness?
> > 
> > I think telling the user that something failed is very useful.  This
> > obviously does not cover off all failure cases as you have pointed out,
> > but it is certainly better than silently continuing as is the case
> > today.
> > 
> > Are you suggesting that the error message be provided if the failure
> > happens after boot as well?
> 
> No, I am just suggesting that the warning as proposed is not useful and
> it is worth the additional (aleit little) code. It doesn't cover many
> other miscofigurations which might be even more serious because there
> would be still _some_ memory left while the system would crawl to death.

There is already some memory left as long as the huge page size doesn't
work out to be exactly the amount of free pages.  This is why it's so
annoying as the OOM kicks in much later in the boot process and leaves
it up to the user to debug a kernel dump with zero error or warning
messages about what happened before things went bad.  Worse yet, I've
seen several pages of OOMs scroll by as each processor takes turns
telling the user it is out of memory.  If there's no message stating any
configuration issue, then many admins would probably think something is
seriously broken and it's not just a simple typo of K vs M.

Even though this doesn't catch all errors, I think it's a worth while
change since this is currently a silent failure which results in a
system crash.

> 
> My objections are not hard enough to give a right NAK I just think this
> is a pointless code which won't help the current situation much.

I'm opened to other suggestions on how to make this better, but
providing the user with more information seems like a good start.  I
think it's reasonable to think many end users would not know what to do
with a kernel oops when no errors or warnings have occurred.

Thanks,
Liam

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux