On 6/5/22 21:39, Ming Lei wrote:
On Sun, Jun 05, 2022 at 07:15:27PM -0400, Waiman Long wrote:
On 6/3/22 23:58, Ming Lei wrote:
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index abec50f31fe6..8c4f204dbf5b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -622,7 +622,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
{
unsigned int x;
- cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id());
+ cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id(), NULL);
x = __this_cpu_add_return(stats_updates, abs(val));
if (x > MEMCG_CHARGE_BATCH) {
I think the rstat set of functions are doing that already. So flush will
only call CPUs that have called cgroup_rstat_updated() before. However, one
Yeah, I guess the detail is in cgroup_rstat_cpu_pop_updated(), but the
percpu lock(raw_spin_lock_irqsave) is still required, and cgroup_rstat_cpu_pop_updated()
is still called even through there isn't any update on this CPU.
Yes, I think we may need to add a bitmask of what controllers have
updates in cgroup_rstat_cpu structure.
deficiency that I am aware of is that there is no bitmap of which controller
have update. The problem that I saw in cgroup v2 is that in a cgroup with
both memory controller and block controller enabled, a
cgroup_rstat_updated() call from memory cgroup later causes the rstat
function to call into block cgroup flush method even though there is no
update in the block controller. This is an area that needs improvement.
Your code does allow the block controller to be aware of that and avoid
further action, but I think it has to be done in the rstat code to be
applicable to all controllers instead of just specific to block controller.
I guess it can be done by adding one percpu variable to 'struct cgroup'.
There is another problem that this approach. Suppose the system have 20
block devices and one of them has an IO operation. Now the flush method
still needs to iterate all the 20 blkg's to do an update. The block
controller is kind of special that the number of per-cgroup IO stats depends
on the number of block devices present. Other controllers just have one set
of stats per cgroup.
Yeah, and this one is really blkio specific issue, and your patch does
cover this one. Maybe you can add one callback to
cgroup_rstat_updated(), so the "blkg_iostat_set" instance is added into
percpu list under percpu lock of cgroup_rstat_cpu_lock, then the lockless
list isn't needed.
The rstat API is generic. It may not be a good idea to put controller
specific information into it. Yes, cgroup_rstat_cpu_lock is taken at the
read side (flush). It may not taken on the write side (update). So it
may not be easy to rely on this lock for synchronization between the
read and write side.
Cheers,
Longman