On Mon, Jan 06, 2025 at 02:48:12PM +0000, Ackerley Tng wrote: > Peter Xu <peterx@xxxxxxxxxx> writes: > > > On Sat, Dec 28, 2024 at 12:06:34AM +0000, Ackerley Tng wrote: > >> > > >> > - /* If this allocation is not consuming a reservation, charge it now. > >> > + /* > >> > + * If this allocation is not consuming a per-vma reservation, > >> > + * charge the hugetlb cgroup now. > >> > */ > >> > - deferred_reserve = map_chg || cow_from_owner; > >> > - if (deferred_reserve) { > >> > + if (map_chg) { > >> > ret = hugetlb_cgroup_charge_cgroup_rsvd( > >> > idx, pages_per_huge_page(h), &h_cg); > >> > >> Should hugetlb_cgroup_charge_cgroup_rsvd() be called when map_chg == MAP_CHG_ENFORCED? > > > > This looks like a pretty niche use case, though I would say yes. > > > > I don't think I take a lot of consideration here when drafting the patch, > > as the change here should have kept the old behavior: map_chg grows into > > the tristate so that we can drop deferred_reserve, OTOH nothing should > > change from such behavior of cgroup charging. > > > > When it happens, it means the owner process CoWed a private hugetlb folio > > which will enforce bypassing the vma reservation. Here bypassing the vma > > check makes sense to me, because the new to-be-cowed folio X will replace > > another folio Y, which should have consumed the private vma resv at this > > specific index. So there's no way the to-be-cowed folio X can have anything > > to do with the vma reservation.. > > > > Besides the vma reservation, I don't see why this folio allocation needs to > > be any more special. IOW, it should still go through all rest checks and > > fail the process properly if the check fails, that should include any form > > of cgroups (either hugetlb or memcg), IMHO. > > > > Do you have any specific thought on this path? > > I re-read the code, and I hope this understanding is right: > > When a user sets "rsvd.max_usage_in_bytes" to X, the user is saying that > within this cgroup, the maximum memory that can be reserved in the vma > reservation is X. Right, and the allocation may or may not attach to a vma reservation at all. In this case it skips the vma reservation however will still need to be accounted; there should have other similar cases where vma resv doesn't count, e.g. MAP_NORESERVE. For those we do accounting on reservations only until allocation time. > > Hence even when this CoW is performed, this should count towards the > cgroup's "rsvd.max_usage_in_bytes" and so yes, it should be charged. > > I think I misunderstood the context on cgroup charging earlier and hence > I thought it shouldn't be charged, but I agree with you after > re-reading. Thanks. I'll hold another 1-2 days then I'll respin. -- Peter Xu