On Wed, May 24, 2023 at 02:51:55PM +0200, Michal Hocko wrote: > [Sorry for a late response but I was conferencing last two weeks and now > catching up] > > On Mon 15-05-23 15:00:15, Marcelo Tosatti wrote: > [...] > > v8 > > - Add summary of discussion on -v7 to cover letter > > Thanks this is very useful! This helps to frame the further discussion. > > I believe the most important question to answer is this in fact > > I think what needs to be done is to avoid new queue_work_on() > > users from being introduced in the tree (the number of > > existing ones is finite and can therefore be fixed). > > > > Agree with the criticism here, however, i can't see other > > options than the following: > > > > 1) Given an activity, which contains a sequence of instructions > > to execute on a CPU, to change the algorithm > > to execute that code remotely (therefore avoid interrupting a CPU), > > or to avoid the interruption somehow (which must be dealt with > > on a case-by-case basis). > > > > 2) To block that activity from happening in the first place, > > for the sites where it can be blocked (that return errors to > > userspace, for example). > > > > 3) Completly isolate the CPU from the kernel (off-line it). > > I agree that a reliable cpu isolation implementation needs to address > queue_work_on problem. And it has to do that _realiably_. This cannot by > achieved by an endless whack-a-mole and chasing each new instance. There > must be a more systematic approach. One way would be to change the > semantic of schedule_work_on and fail call for an isolated CPU. The > caller would have a way to fallback and handle the operation by other > means. E.g. vmstat could simply ignore folding pcp data because an > imprecision shouldn't really matter. Other callers might chose to do the > operation remotely. This is a lot of work, no doubt about that, but it > is a long term maintainable solution that doesn't give you new surprises > with any new released kernel. There are likely other remote interfaces > that would need to follow that scheme. > > If the cpu isolation is not planned to be worth that time investment > then I do not think it is also worth reducing a highly optimized vmstat > code. These stats are invoked from many hot paths and per-cpu > implementation has been optimized for that case. It is exactly the same code, but now with a "LOCK" prefix for CMPXCHG instruction. Which should not cost much due to cache locking (these are per-CPU variables anyway). > If your workload would > like to avoid that as disturbing then you already have a quiet_vmstat > precedence so find a way how to use it for your workload instead. > > -- > Michal Hocko > SUSE Labs OK so an alternative solution is to completly disable vmstat updates for isolated CPUs. Are you OK with that ?