Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 13, 2015 at 05:36:32PM +0000, Mathieu Desnoyers wrote:
> ----- On Jul 13, 2015, at 7:17 AM, Ben Maurer bmaurer@xxxxxx wrote:
> 
> > At Facebook we already use getcpu in folly, our base C++ library, to provide
> > high performance concurrency algorithms. Folly includes an abstraction called
> > AccessSpreader which helps engineers write abstractions which shard themselves
> > across different cores to prevent cache contention
> > (https://github.com/facebook/folly/blob/master/folly/detail/CacheLocality.cpp).

Could you contribute your improvements/tips to libc? If these help for
c++ mutex then it would also improve c mutex.

> > We have used this primative to create faster reader writer locks
> > (https://github.com/facebook/folly/blob/master/folly/SharedMutex.h), as well as
> > in an abstraction that powers workqueues
> > (https://github.com/facebook/folly/blob/master/folly/IndexedMemPool.h). This
> > would be a great perf improvement for these types of abstractions and probably
> > encourage us to use the idea more widely.
> > 
As libc rwlocks now are slow it gets speedup from that. Main problem
with this is that lock elission will give you bigger speedups that that.

Also from description you have wrong rwlock usecase, main application is
avoid blocking, when two readers take lock for long time having one wait
would be terrible.

> > One quick comment on the approach -- it'd be really great if we had a method
> > that didn't require users to register each thread. This can often lead to
> > requiring an additional branch in critical code to check if the appropriate
> > caches have been initialized. Also, one of the most interesting potential
> > applications of the restartable sequences concept is in malloc. having a brief
> > period at the beginning of the life of a thread where malloc didn't work would
> > be pretty tricky to program around.
> 
> If we invoke this per-thread registration directly in the glibc NPTL implementation,
> in start_thread, do you think it would fit your requirements ?
>
A generic solution would be adding eager initialization of thread_local
variables which would fix more performance problems.

Second would be write patch to libc adding function
pthread_create_add_hook_np to register function that would be ran after
each thread cretion.
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux