Hi, I've recently encountered a performance issue with gcc and accessing thread-local storage. I've posted a question on SO: https://stackoverflow.com/questions/67894898/why-doesnt-gcc-eliminate-successive-calls-to-tls-get-addr with the details of the problem, which has unfortunately not been answered yet. In short, GCC will call _tls_get_addr() on each access to a tls variable inside a function, even if the variable is local. I would expect that from the second access onwards, the result from the first access can be cached and used instead. This is also what clang will do with "-O3". A person on SO has led me to this discussion on the gcc mailing list in 2012: https://gcc.gnu.org/legacy-ml/gcc/2012-10/msg00024.html I wonder what it would take to implement this optimization given the current state of GCC? Are there maybe any thread safety reasons I miss why it can't be done? Thanks, Roy Jacobson