Hi,
On 14.01.25 16:58, Michal Koutný wrote:
On Tue, Jan 14, 2025 at 04:39:12PM +0100, Friedrich Vock <friedrich.vock@xxxxxx> wrote:
If the current css doesn't contain any pool that is a descendant of
the "pool" (i.e. when found_descendant == false), then "pool" will
point to some unrelated pool. If the current css has a child, we'll
overwrite parent_pool with this unrelated pool on the next iteration.
Could this be verified with more idiomatic way with
cgroup_is_descendant()? (The predicate could be used between pools [1]
if they pin respective cgroups).
I'm not sure if I'm missing something, but I don't think
cgroup_is_descendant is really related to this issue. Each css can
contain some amount of "pools" representing memory from different
devices (e.g. with the current DRM implementation, one pool corresponds
to VRAM of a specific GPU). These pools are allocated on-demand, so if a
cgroup has not made any allocations for a specific device, there will be
no pool corresponding to that device's memory. Pools have a hierarchy of
their own (that is, for a given cgroup's pool corresponding to some
device, the "parent pool" refers to the parent cgroup's pool
corresponding to the same device).
In dmem_cgroup_calculate_protection, we're trying to update the
protection values for the entire pool hierarchy between
limit_pool/test_pool (with the end goal of having accurate
effective-protection values for test_pool). Since pools only store
parent pointers to establish that hierarchy, to find child pools given
only the parent pool, we iterate over the pools of all child cgroups and
check if the parent pointer matches with our current "parent pool" pointer.
The bug happens when some cgroup doesn't have any pool in the hierarchy
we're iterating over (that is, we iterate over all pools but don't find
any pool whose parent matches our current "parent pool" pointer). The
cgroup itself is part of the (cgroup) hierarchy, so the result of
cgroup_is_descendant is obviously true - but because of how we allocate
pools on-demand, it's still possible there is no pool that is part of
the (pool) hierarchy we're iterating over.
Thanks,
Friedrich
Thanks,
Michal
[1] https://lore.kernel.org/all/uj6railxyazpu6ocl2ygo6lw4lavbsgg26oq57pxxqe5uzxw42@fhnqvq3tia6n/