On Tue, 10 Jan 2023, Marcelo Tosatti wrote: > > The basic primitives add a lot of weight. > > Can't see any alternative given the necessity to avoid interruption > by the work to sync per-CPU vmstats to global vmstats. this_cpu operations are designed to operate on a *single* value (a counter) and can be run on an arbitrary cpu, There is no preemption or interrupt disable required since the counters of all cpus will be added up at the end. You want *two* values (the counter and the dirty flag) to be modified together and want to use the counters/flag to identify the cpu where these events occurred. this_cpu_xxx operations are not suitable for that purpose. You would need a way to ensure that both operations occur on the same cpu. > > > And the pre cpu atomic updates operations require the modification > > of multiple values. The operation > > cannot be "atomic" in that sense anymore and we need some other form of > > synchronization that can > > span multiple instructions. > > So use this_cpu_cmpxchg() to avoid the overhead. Since we can no longer > count on preremption being disabled we still have some minor issues. > The fetching of the counter thresholds is racy. > A threshold from another cpu may be applied if we happen to be > rescheduled on another cpu. However, the following vmstat operation > will then bring the counter again under the threshold limit. > > Those small issues are gone, OTOH. Well you could use this_cpu_cmpxchg128 to update a 64 bit counter and a flag at the same time. Otherwise you will have to switch off preemption or interrupts when incrementing the counters and updating the dirty flag. Thus you do not really need the this_cpu operations anymore. It would best to use a preempt_disable section and uuse C operators -- ++ for the counter and do regular assignment for the flag.