On Mon 02-03-15 17:18:14, Mike Kravetz wrote: > On 03/02/2015 03:10 PM, Andrew Morton wrote: > >On Fri, 27 Feb 2015 14:58:08 -0800 Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > > >>hugetlbfs allocates huge pages from the global pool as needed. Even if > >>the global pool contains a sufficient number pages for the filesystem > >>size at mount time, those global pages could be grabbed for some other > >>use. As a result, filesystem huge page allocations may fail due to lack > >>of pages. > > > >Well OK, but why is this a sufficiently serious problem to justify > >kernel changes? Please provide enough info for others to be able > >to understand the value of the change. > > > > Thanks for taking a look. > > Applications such as a database want to use huge pages for performance > reasons. hugetlbfs filesystem semantics with ownership and modes work > well to manage access to a pool of huge pages. However, the application > would like some reasonable assurance that allocations will not fail due > to a lack of huge pages. Before starting, the application will ensure > that enough huge pages exist on the system in the global pools. What > the application wants is exclusive use of a pool of huge pages. > > One could argue that this is a system administration issue. The global > huge page pools are only available to users with root privilege. > Therefore, exclusive use of a pool of huge pages can be obtained by > limiting access. However, many applications are installed to run with > elevated privilege to take advantage of resources like huge pages. It > is quite possible for one application to interfere another, especially > in the case of something like huge pages where the pool size is mostly > fixed. > > Suggestions for other ways to approach this situation are appreciated. > I saw the existing support for "reservations" within hugetlbfs and > thought of extending this to cover the size of the filesystem. Maybe I do not understand your usecase properly but wouldn't hugetlb cgroup (CONFIG_CGROUP_HUGETLB) help to guarantee the same? Just configure limits for different users/applications (inside different groups) so that they never overcommit the existing pool. Would that work for you? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>