I wanted to explore the idea of exploiting knowledge about previous repacks to help speed up future repacks. I had various ideas that seemed like they might be good places to start, but things quickly got away from me. Mainly I wanted to focus on reducing and even sometimes eliminating reachability calculations since that seems to be be the one major unsolved slow piece during repacking. My first line of thinking goes like this: "After a full repack, reachability of the current refs is known. Exploit that knowledge for future repacks." There are some very simple scenarios where if we could figure out how to identify them reliably, I think we could simply avoid reachability calculations entirely, and yet end up with the same repacked files as if we had done the reachability calculations. Let me outline some to see if they make sense as starting place for further discussion. ------------- * Setup 1: Do a full repack. All loose and packed objects are added to a single pack file (assumes git config repack options do not create multiple packs). * Scenario 1: Start with Setup 1. Nothing has changed on the repo contents (no new object/packs, refs all the same), but repacking config options have changed (for example compression level has changed). * Scenario 2: Starts with Setup 1. Add one new pack file that was pushed to the repo by adding a new ref to the repo (existing refs did not change). * Scenario 3: Starts with Setup 1. Add one new pack file that was pushed to the repo by updating an existing ref with a fast forward. * Scenario 4: Starts with Setup 1. Add some loose objects to the repo via a local fast forward ref update (I am assuming this is possible without adding any new unreferenced objects?) In all 4 scenarios, I believe we should be able to skip history traversal and simply grab all objects and repack them into a new file? ------------- Of the 4 scenarios above, it seems like #3 and #4 are very common operations (#2 is perhaps even more common for Gerrit)? If these scenarios can be reliably identified somehow, then perhaps they could be used to reduce repacking time for these scenarios, and later used as building blocks to reduce repacking time for other related but slightly more complicated scenarios (with reduced history walking instead of none)? For example to identify scenario 1, what if we kept a copy of all refs and their shas used during a full repack along with the newly repacked file? A simplistic approach would store them in the same format as the packed-refs file as pack-<sha>.refs. During repacking, if none of the refs have changed and there are no new objects... Then, if none of the refs have changed and there are new objects, we can just throw the new objects away? ... I am going to stop here because this email is long enough and I wanted to get some feedback on the ideas first before offering more solutions. Thanks, -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html