On Mon, Oct 03, 2022 at 08:44:35PM +0800, Hillf Danton wrote: > On 26 Sep 2022 10:20:04 +0100 Aaron Tomlin <atomlin@xxxxxxxxxx> wrote: > > On Sun 2022-09-25 09:05 +0800, Hillf Danton wrote: > > > On 24 Sep 2022 16:24:41 +0100 Aaron Tomlin <atomlin@xxxxxxxxxx> wrote: > > > > > > > > In the context of the idle task and an adaptive-tick mode/or a nohz_full > > > > CPU, quiet_vmstat() can be called: before stopping the idle tick, > > > > entering an idle state and on exit. In particular, for the latter case, > > > > when the idle task is required to reschedule, the idle tick can remain > > > > stopped and the timer expiration time endless i.e., KTIME_MAX. Now, > > > > indeed before a nohz_full CPU enters an idle state, CPU-specific vmstat > > > > counters should be processed to ensure the respective values have been > > > > reset and folded into the zone specific 'vm_stat[]'. That being said, it > > > > can only occur when: the idle tick was previously stopped, and > > > > reprogramming of the timer is not required. > > > > > > > > A customer provided some evidence which indicates that the idle tick was > > > > stopped; albeit, CPU-specific vmstat counters still remained populated. > > > > Thus one can only assume quiet_vmstat() was not invoked on return to the > > > > idle loop. > > > > > > Why did housekeeping CPUs fail to do their works, with this assumption > > > put aside? > > > > Hi Hillf, > > > > I'm not sure I understand your question. > > > > In this context, when tick processing is stopped, delayed work is not going > > to be handled until the CPU exits idle. > > Given work canceled because per-CPU pages can be freed remotely from > housekeeping CPUs (see patch 3/5), what is added here is not needed. > > IOW which one is incorrect? > > BTW given delayed work is not going to be handled until the CPU exits idle, Hi Hilf, The comment on the codebase now is: void quiet_vmstat(void) { if (system_state != SYSTEM_RUNNING) return; if (!delayed_work_pending(this_cpu_ptr(&vmstat_work))) return; if (!need_update(smp_processor_id())) return; /* * Just refresh counters and do not care about the pending delayed * vmstat_update. It doesn't fire that often to matter and canceling * it would be too expensive from this path. * vmstat_shepherd will take care about that for us. */ refresh_cpu_vm_stats(false); } However this is incorrect. The pending delayed work is only cancelled when executed and not requeued from: static void vmstat_update(struct work_struct *w) { if (refresh_cpu_vm_stats(true)) { /* * Counters were updated so we expect more updates * to occur in the future. Keep on running the * update worker thread. */ queue_delayed_work_on(smp_processor_id(), mm_percpu_wq, this_cpu_ptr(&vmstat_work), round_jiffies_relative(sysctl_stat_interval)); } } Since this patchset changes the synchronization to happen at return to userspace or entering idle, we do want to cancel that work (which, after synchronization, is not necessary). > canceling work is noop in 3/5, despite what the vmstat shepherd does depends > not on tick. Canceling work is a not a noop in 3/5: If the work is not cancelled (if 3/5 is dropped), there will be a pending work to be executed, from the kworker thread on an isolated CPU. Which is undesired for a fully isolated CPU, with no interruptions.