On Tue, Jan 05, 2016 at 10:34:04PM +0000, Mathieu Desnoyers wrote: > ----- On Jan 5, 2016, at 4:47 PM, Paul E. McKenney paulmck@xxxxxxxxxxxxxxxxxx wrote: > > > On Tue, Jan 05, 2016 at 05:40:18PM +0000, Russell King - ARM Linux wrote: > >> On Tue, Jan 05, 2016 at 05:31:45PM +0000, Mathieu Desnoyers wrote: > >> > For instance, an application could create a linked list or hash map > >> > of thread control structures, which could contain the current CPU > >> > number of each thread. A dispatch thread could then traverse or > >> > lookup this structure to see on which CPU each thread is running and > >> > do work queue dispatch or scheduling decisions accordingly. > >> > >> So, what happens if the linked list is walked from thread X, and we > >> discover that thread Y is allegedly running on CPU1. We decide that > >> we want to dispatch some work on that thread due to it being on CPU1, > >> so we send an event to thread Y. > >> > >> Thread Y becomes runnable, and the scheduler decides to schedule the > >> thread on CPU3 instead of CPU1. > >> > >> My point is that the above idea is inherently racy. The only case > >> where it isn't racy is when thread Y is bound to CPU1, and so can't > >> move - but then you'd know that thread Y is on CPU1 and there > >> wouldn't be a need for the inherent complexity suggested above. > >> > >> The behaviour I've seen on ARM from the scheduler (on a quad CPU > >> platform, observing the system activity with top reporting the last > >> CPU number used by each thread) is that threads often migrate > >> between CPUs - especially in the case of (eg) one or two threads > >> running in a quad-CPU system. > >> > >> Given that, I'm really not sure what the use of reading and making > >> decisions on the current CPU number would be within a program - > >> unless the thread is bound to a particular CPU or group of CPUs, > >> it seems that you can't rely on being on the reported CPU by the > >> time the system call returns. > > > > As I understand it, the idea is -not- to eliminate synchronization > > like we do with per-CPU variables in the kernel, but rather to > > reduce the average cost of synchronization. For example, there > > might be a separate data structure per CPU, each structure guarded > > by its own lock. A thread could sample the current running CPU, > > acquire that CPU's corresponding lock, and operate on that CPU's > > structure. This would work correctly even if there was an arbitrarily > > high number of preemptions/migrations, but would have improved > > performance (compared to a single global lock) in the common case > > where there were no preemptions/migrations. > > > > This approach can also be used in conjunction with Paul Turner's > > per-CPU atomics. > > > > Make sense, or am I missing your point? > > Russell's point is more about accessing a given thread's cpu_cache > variable from other threads/cores, which is beyond what is needed > for restartable critical sections. Fair enough! > Independently of the usefulness of reading other thread's cpu_cache > to see their current CPU, I would advocate for checking the cpu_cache > natural alignment, and return EINVAL if it is not aligned. Even for > thread-local reads, we care about ensuring there is no load tearing > when reading this variable. The behavior of the kernel updating this > variable read by a user-space thread is very similar to having a > variable updated by a signal handler nested on top of a thread. This > makes it simpler and reduces the testing state space. Makes sense to me! Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html