On Mon, Apr 8, 2019 at 5:52 AM Christian Couder <christian.couder@xxxxxxxxx> wrote: > > Git has a very optimized mechanism to compactly store > > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are > > created by[3]: > > > > 1. listing objects; > > 2. sorting the list with some good heuristics; > > 3. traversing the list with a sliding window to find similar objects in > > the window, in order to do delta decomposing; > > 4. compress the objects with zlib and write them to the packfile. > > > > What we are calling pack access code in this document, is the set of > > functions responsible for retrieving the objects stored at the > > packfiles. This process consists, roughly speaking, in three parts: > > > > 1. Locate and read the blob from packfile, using the index file; > > 2. If the blob is a delta, locate and read the base object to apply the > > delta on top of it; > > 3. Once the full content is read, decompress it (using zlib inflate). > > > > Note: There is a delta cache for the second step so that if another > > delta depends on the same base object, it is already in memory. This > > cache is global; also, the sliding windows, are global per packfile. > > Yeah, but the sliding windows are used only when creating pack files, > not when reading them, right? These windows are actually for reading. We used to just mmap the whole pack file in the early days but that was impossible for 4+ GB packs on 32-bit platforms, which was one of the reasons, I think, that sliding windows were added, to map just the parts we want to read. > > # Points to work on > > > > * Investigate pack access call chains and look for non-thread-safe > > operations on then. > > * Protect packfile.c read-and-write global variables, such as > > pack_open_windows, pack_open_fds and etc., using mutexes. > > Do you want to work on making both packfile reading and packfile > writing thread safe? Or just packfile reading? Packfile writing is probably already or pretty close to thread-safe (at least the main writing code path in git-pack-objects; the streaming blobs to a pack, i'm not so sure). -- Duy