On Thu, May 23, 2019 at 01:51:47PM -0300, Matheus Tavares Bernardino wrote: > As one of my first tasks in GSoC, I'm looking to protect the global > states at sha1-file.c for future parallelizations. Currently, I'm > analyzing how to deal with the cached_objects array, which is a small > set of in-memory objects that read_object_file() is able to return > although they don't really exist on disk. The only current user of > this set is git-blame, which adds a fake commit containing > non-committed changes. > > As it is now, if we start parallelizing blame, cached_objects won't be > a problem since it is written to only once, at the beginning, and read > from a couple times latter, with no possible race conditions. > > But should we make these operations thread safe for future uses that > could involve potential parallel writes and reads too? > > If so, we have two options: > - Make the array thread local, which would oblige us to replicate data, or > - Protect it with locks, which could impact the sequential > performance. We could have a macro here, to skip looking on > single-threaded use cases. But we don't know, a priori, the number of > threads that would want to use the pack access code. It seems like a lot of the sha1-reading code is 99% read-only, but very occasionally will require a write (e.g., refreshing the packed_git list when we fail a lookup, or manipulating the set of cached mmap windows). I think pthreads has read/write locks, where many readers can hold the lock simultaneously but a writer blocks readers (and other writers). Then in the common case we'd only pay the price to take the lock, and not deal with contention. I don't know how expensive it is to take such a read lock; it's presumably just a few instructions but implies a memory barrier. Maybe it's worth timing? -Peff