Re: [GIT PULL] workqueue fixes for v4.3-rc5

Tejun Heo <tj@xxxxxxxxxx> · Wed, 14 Oct 2015 15:38:29 -0400

Hello, Linus.

On Wed, Oct 14, 2015 at 12:16:48PM -0700, Linus Torvalds wrote:
> On Wed, Oct 14, 2015 at 12:02 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> >
> > But wasn't add_timer() always CPU-local at the time?  add_timer()
> > allowing cross-cpu migrations came way after that.
> 
> add_timer() has actually never been "local CPU only" either.
> 
> What add_timer() does is to keep the timer on the old CPU timer wheel
> if it was active, and if it wasn't, put it on the current CPU timer
> wheel.

Doesn't seem that way.  This is from 597d0275736d^ - right before
TIMER_NOT_PINNED is introduced.  add_timer() eventually calls into
__mod_timer().

static inline int
__mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
{
	...
	base = lock_timer_base(timer, &flags);
	...
	new_base = __get_cpu_var(tvec_bases);

	if (base != new_base) {
		...
		if (likely(base->running_timer != timer)) {
			/* See the comment in lock_timer_base() */
			timer_set_base(timer, NULL);
			spin_unlock(&base->lock);
			base = new_base;
			spin_lock(&base->lock);
			timer_set_base(timer, base);
		}
	}
	...
	internal_add_timer(base, timer);
	...
}

It looks like the timers for work items will be reliably queued on the
local CPU.

> So again, by pure *accident*, if you don't end up ever modifying an
> already-active timer, then yes, it ended up being local. But even
> then, things like suspend/resume can move timers around, afaik, so
> even then it has never been a real guarantee.
> 
> And I see absolutely no sign that the local cpu case has ever been intentional.

Heh, I don't think much in this area is intended.  It's mostly all
historical accidents and failures to get things cleaned up in time.

> Now, obviously, that said there is obviously at least one case that
> seems to have relied on it (ie the mm/vmstat.c case), but I think we
> should just fix that.
> 
> If it turns out that there really are *lots* of cases where
> "schedule_delayed_work()" expects the work to always run on the CPU
> that it is scheduled on, then we should probably take your patch just
> because it's too painful not to.
> 
> But I'd like to avoid that if possible.

So, the following two things bother me about this.

* Given that this is the first reported case of breakage, I don't
  think this is gonna cause lots of criticial issues; however, the
  only thing this indicates is that there simply hasn't been enough
  cases where timers actualy migrate.  If we end up migrating timers
  more actively in the future, it's possible that we'll see more
  breakages which will likely be subtler.

* This makes queue_delayed_work() behave differently from queue_work()
  and when I checked years ago the local queueing guarantee was
  definitely being depended upon by some users.

I do want to get rid of the local queueing guarnatee for all work
items.  That said, I don't think this is the right way to do it.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>