On Mon, Apr 8, 2019 at 5:32 AM Duy Nguyen <pclouds@xxxxxxxxx> wrote: > > On Mon, Apr 8, 2019 at 8:23 AM Duy Nguyen <pclouds@xxxxxxxxx> wrote: > > > > On Mon, Apr 8, 2019 at 5:52 AM Christian Couder > > <christian.couder@xxxxxxxxx> wrote: > > > > Git has a very optimized mechanism to compactly store > > > > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are > > > > created by[3]: > > > > > > > > 1. listing objects; > > > > 2. sorting the list with some good heuristics; > > > > 3. traversing the list with a sliding window to find similar objects in > > > > the window, in order to do delta decomposing; > > > > 4. compress the objects with zlib and write them to the packfile. > > > > > > > > What we are calling pack access code in this document, is the set of > > > > functions responsible for retrieving the objects stored at the > > > > packfiles. This process consists, roughly speaking, in three parts: > > > > > > > > 1. Locate and read the blob from packfile, using the index file; > > > > 2. If the blob is a delta, locate and read the base object to apply the > > > > delta on top of it; > > > > 3. Once the full content is read, decompress it (using zlib inflate). > > > > > > > > Note: There is a delta cache for the second step so that if another > > > > delta depends on the same base object, it is already in memory. This > > > > cache is global; also, the sliding windows, are global per packfile. > > > > > > Yeah, but the sliding windows are used only when creating pack files, > > > not when reading them, right? > > > > These windows are actually for reading. We used to just mmap the whole > > pack file in the early days but that was impossible for 4+ GB packs on > > 32-bit platforms, which was one of the reasons, I think, that sliding > > windows were added, to map just the parts we want to read. > > To clarify (I think I see why you mentioned pack creation now), there > are actually two window concepts. core.packedGitWindowSize is about > reading pack files. pack.window is for generating pack files. The > second window should already be thread-safe since we do all the > heuristics to find best base object candidates in threads. Yeah, it is not very clear in the proposal which windows it is talking about as I think a window is first mentioned when describing the steps to create a packfile in: "3. traversing the list with a sliding window to find similar objects in the window, in order to do delta decomposing;" Also the proposal plans to "Protect packfile.c read-and-write global variables ..." which made me wonder if it was also about improving thread safety when generating pack files. Thanks for clarifying!