Hello, Linus. On Wed, Oct 14, 2015 at 12:16:48PM -0700, Linus Torvalds wrote: > On Wed, Oct 14, 2015 at 12:02 PM, Tejun Heo <tj@xxxxxxxxxx> wrote: > > > > But wasn't add_timer() always CPU-local at the time? add_timer() > > allowing cross-cpu migrations came way after that. > > add_timer() has actually never been "local CPU only" either. > > What add_timer() does is to keep the timer on the old CPU timer wheel > if it was active, and if it wasn't, put it on the current CPU timer > wheel. Doesn't seem that way. This is from 597d0275736d^ - right before TIMER_NOT_PINNED is introduced. add_timer() eventually calls into __mod_timer(). static inline int __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only) { ... base = lock_timer_base(timer, &flags); ... new_base = __get_cpu_var(tvec_bases); if (base != new_base) { ... if (likely(base->running_timer != timer)) { /* See the comment in lock_timer_base() */ timer_set_base(timer, NULL); spin_unlock(&base->lock); base = new_base; spin_lock(&base->lock); timer_set_base(timer, base); } } ... internal_add_timer(base, timer); ... } It looks like the timers for work items will be reliably queued on the local CPU. > So again, by pure *accident*, if you don't end up ever modifying an > already-active timer, then yes, it ended up being local. But even > then, things like suspend/resume can move timers around, afaik, so > even then it has never been a real guarantee. > > And I see absolutely no sign that the local cpu case has ever been intentional. Heh, I don't think much in this area is intended. It's mostly all historical accidents and failures to get things cleaned up in time. > Now, obviously, that said there is obviously at least one case that > seems to have relied on it (ie the mm/vmstat.c case), but I think we > should just fix that. > > If it turns out that there really are *lots* of cases where > "schedule_delayed_work()" expects the work to always run on the CPU > that it is scheduled on, then we should probably take your patch just > because it's too painful not to. > > But I'd like to avoid that if possible. So, the following two things bother me about this. * Given that this is the first reported case of breakage, I don't think this is gonna cause lots of criticial issues; however, the only thing this indicates is that there simply hasn't been enough cases where timers actualy migrate. If we end up migrating timers more actively in the future, it's possible that we'll see more breakages which will likely be subtler. * This makes queue_delayed_work() behave differently from queue_work() and when I checked years ago the local queueing guarantee was definitely being depended upon by some users. I do want to get rid of the local queueing guarnatee for all work items. That said, I don't think this is the right way to do it. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>