One thought on the v3 delta that I missed earlier: On Thu, Jan 24, 2019 at 01:15:18PM -0800, Suren Baghdasaryan wrote: > +/* > + * psi_update_work represents slowpath accounting part while psi_group_change > + * represents hotpath part. There are two potential races between them: > + * 1. Changes to group->polling when slowpath checks for new stall, then hotpath > + * records new stall and then slowpath resets group->polling flag. This leads > + * to the exit from the polling mode while monitored state is still changing. > + * 2. Slowpath overwriting an immediate update scheduled from the hotpath with > + * a regular update further in the future and missing the immediate update. > + * Both races are handled with a retry cycle in the slowpath: > + * > + * HOTPATH: | SLOWPATH: > + * | > + * A) times[cpu] += delta | E) delta = times[*] > + * B) start_poll = (delta[poll_mask] &&| if delta[poll_mask]: > + * cmpxchg(g->polling, 0, 1) == 0)| F) polling_until = now + grace_period > + * if start_poll: | if now > polling_until: > + * C) mod_delayed_work(1) | if g->polling: With the polling flag being atomic now, this "if g->polling" line isn't accurate anymore. Since this diagram is specifically about memory ordering, this should move the g->polling load up to where delta is read and then refer to unordered local variables down here.