On Mon, Oct 2, 2023 at 8:25 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Mon, Oct 02, 2023 at 05:08:34PM +0200, Michal Hocko wrote: > > On Mon 02-10-23 10:50:26, Johannes Weiner wrote: > > > On Mon, Oct 02, 2023 at 03:43:19PM +0200, Michal Hocko wrote: > > > > On Wed 27-09-23 17:57:22, Nhat Pham wrote: > > [...] > > > > - memcg limit reclaim doesn't assist hugetlb pages allocation when > > > > hugetlb overcommit is configured (i.e. pages are not consumed from the > > > > pool) which means that the page allocation might disrupt workloads > > > > from other memcgs. > > > > - failure to charge a hugetlb page results in SIGBUS rather > > > > than memcg oom killer. That could be the case even if the > > > > hugetlb pool still has pages available and there is > > > > reclaimable memory in the memcg. > > > > > > Are these actually true? AFAICS, regardless of whether the page comes > > > from the pool or the buddy allocator, the memcg code will go through > > > the regular charge path, attempt reclaim, and OOM if that fails. > > > > OK, I should have been more explicit. Let me expand. Charges are > > accounted only _after_ the actual allocation is done. So the actual > > allocation is not constrained by the memcg context. It might reclaim > > from the memcg at that time but the disruption could have already > > happened. Not really any different from regular memory allocation > > attempt but much more visible with GB pages and one could reasonably > > expect that memcg should stop such a GB allocation if the local reclaim > > would be hopeless to free up enough from its own consumption. > > > > Makes more sense? > > Yes, that makes sense. > > This should be fairly easy to address by having hugetlb do the split > transaction that charge_memcg() does in one go, similar to what we do > for the hugetlb controller as well. IOW, > > alloc_hugetlb_folio() > { > if (mem_cgroup_hugetlb_try_charge()) > return ERR_PTR(-ENOMEM); > > folio = dequeue(); > if (!folio) { > folio = alloc_buddy(); > if (!folio) > goto uncharge; > } > > mem_cgroup_hugetlb_commit_charge(); > } Ah actually, I like this better. If I do this I can circumvent all the redo_reserve bogus as well!