Hi Mike, On Tue, 29 May 2018, Mike Kravetz wrote: > > When charging to a hugetlb_cgroup fails, alloc_huge_page() returns > > ERR_PTR(-ENOSPC) which will cause VM_FAULT_SIGBUS to be returned to the > > page fault handler. > > > > Instead, return the proper error code, ERR_PTR(-ENOMEM), so VM_FAULT_OOM > > is handled correctly. This is consistent with failing mem cgroup charges > > in the non-hugetlb fault path. > > Apologies for the late reply. > > I am not %100 sure we want to make this change. When hugetlb cgroup support > was added by Aneesh, the intention was for the application to get SIGBUS. > > commit 2bc64a204697 > https://lwn.net/Articles/499255/ > > Since the code has always caused SIGBUS when exceeding cgroup limit, there > may be applications depending on this behavior. I would be especially > concerned with HPC applications which were the original purpose for adding > the feature. > > Perhaps, the original code should have returned ENOMEM to be consistent as > in your patch. That does seem to be the more correct behavior. But, do we > want to change behavior now (admittedly undocumented) and potentially break > some application? > > I echo Michal's question about the reason for the change. If there is a > real problem or issue to solve, that makes more of a case for making the > change. If it is simply code/behavior cleanup for consistency then I would > suggest not making the change, but rather documenting this as another > hugetlbfs "special behavior". > Yes, I mentioned the backwards compatibility issue and I'm not sure there is a likely way around it. But it's rather unfortunate that applications can become constrained in such a way that SIGBUS may be unavoidable if alloc_buddy_huge_page_with_mpol() cannot allocate from surplus and/or the hugetlb_cgroup limit is reached. Not only are both racy, but applications prior to hugetlb_cgroup was introduced may have avoided SIGBUS by checking global hstate and are now limited to hugetlb_cgroup constraints unknowingly. It's also not possible to avoid the SIGBUS by trying to terminate a lower priority process that has hugetlb reservations. I'm not sure there is a path forward that can make this more deterministic. We have customers who have reported receiving SIGBUS deep in their allocation stack using MAP_HUGETLB and were checking global hstate but were unaware of any hugetlb_cgroup restriction. Andrew, please drop the patch. I'd like to know if anybody has any ideas on how this can be more userspace friendly, however.