Hello, Linus. On Wed, Oct 14, 2015 at 10:36:50AM -0700, Linus Torvalds wrote: > > For both delayed and !delayed work items on per-cpu workqueues, > > queueing without specifying a specific CPU always meant queueing on > > the local CPU. > > That's just not true, as far as I can tell. > > I went back in the history to 2012, and it does > > if (unlikely(cpu != WORK_CPU_UNBOUND)) > add_timer_on(timer, cpu); > else > add_timer(timer); > > so this whole WORK_CPU_UNBOUND means "add_timer()" without specifying > a CPU has been true for at least the last several years. > > So the documentation, the code, and the name all agree: But wasn't add_timer() always CPU-local at the time? add_timer() allowing cross-cpu migrations came way after that. commit 597d0275736dad9c3bda6f0a00a1c477dc0f37b1 Author: Arun R Bharadwaj <arun@xxxxxxxxxxxxxxxxxx> Date: Thu Apr 16 12:13:26 2009 +0530 timers: Framework for identifying pinned timers > WORK_CPU_UNBOUND does *not* mean that it's guaranteed to be the local > CPU. The documentation says "preferred", the code clearly doesn't > specify the CPU, and the name says "not bound to a particular CPU". The name and documentation were mostly me being too hopeful and then being lazy after realizing it wouldn't work. > > I wish this were an ad-hoc thing but this has been the guaranteed > > behavior all along. Switching the specific call-site to > > queue_work_on() would still be a good idea tho. > > I really don't see who you say that it has been guaranteed behavior all along. The behavior was just like that from the beginning. Workqueues were always CPU-affine and timers too and we never properly distinguished CPU affinity for the purpose of correctness from optimization. > It clearly has not at all been guaranteed behavior. The fact that you > had to change the code to do that should have made it clear. It wasn't like that before the timer change. > The code has *always* done that non-cpu-specific "add_timer()", as far > as I can tell. Even back when that non-bound CPU was indicated by a > negative CPU number, and the code did > > if (unlikely(cpu >= 0)) > add_timer_on(timer, cpu); > else > add_timer(timer); > > (that's from 2007, btw). At the time, it was just an alternative way to writing. if (cpu < 0) cpu = raw_smp_processor_id(); add_timer_on(timer, cpu); > So I really don't see your "guaranteed behavior" argument. It seems to > be downright pure bullshit. The lack of a specific CPU has _always_ > (where "always" means "at least since 2007") meant "non-specific cpu", > rather than "local cpu". > > If some particular interface ended up then actually using "local cpu" > instead, that was neither guaranteed nor implied - it was just a > random implementation detail, and shouldn't be something we guarantee > at all. > > We strive to maintain user-space ABI issues even in the face of > unintentional bugs and misfeatures. But we do *not* keep broken random > in-kernel interfaces alive. We fix breakage and clean up code rather > than say "some random in-kernel user expects broken behavior". > > And it seems very clear that WORK_CPU_UNBOUND does *not* mean "local > cpu", and code that depends on it meaning local cpu is broken. > > Now, there are reasons why the *implementation* might want to choose > the local cpu for things - avoiding waking up other cpu's > unnecessarily with cross-cpu calls etc - but at the same time I think > it's quite clear that mm/vmstat.c is simply broken in using a > non-bound interface and then expecting/assuming a particular CPU. I don't think I'm confused about the timer behavior. At any rate, for !delayed work items, the failure to distinguish bound for correctness and optimization has been for quite a while. I'd be happy to ditch that but I don't think it'd be a good idea to do this at -rc5 w/o auditing or debug mechanisms to detect errors. > >> (That said, it's not obvious to me why we don't just specify the cpu > >> in the work structure itself, and just get rid of the "use different > >> functions to schedule the work" model. I think it's somewhat fragile > >> how you can end up using the same work in different "CPU boundedness" > >> models like this). > > > > Hmmm... you mean associating a delayed work item with the target > > pool_workqueue on queueing and sharing the queueing paths for both > > delayed and !delayed work items? > > I wouldn't necessarily even go that far. That's more of an > implementation detail. > > I just wonder if we perhaps should add the CPU boundedness to the init > stage, and hide it away in the work structure. So if you just want to > do work (delayed or not), you'd continue to do > > INIT_DELAYED_WORK(&work, ...); > > schedule_delayed_work(&work, delay); > > but if you want a percpu thing, you'd do > > INIT_DELAYED_WORK_CPU(&work, cpu, ...); > > schedule_delayed_work(&work, delay); > > rather than have that "..work_on(cpu, &work, )" interface. The actual > *implementation* could stay the same, we'd just hide the CPU bound in > the work struct and use it as an implicit argument. > > But it's not a big deal. I'm just saying that the current interface > obviously allows that confusion about whether a work is percpu or not, > on a per-call-site basis. I was thinking more along the line of introducing queue_work_on_local() and friends but yeah we definitely need to explicitly distinguish affinity for correctness and for performance. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>