Dana How wrote: > This patch complicates git-repack.sh quite a bit and > I'm unclear on what _problem_ you're addressing. The problem is simple, and it is partially in the eye of the beholder. That is; 1. without repacking, you get a lot of loose objects. - unnecessary disk space usage - bad performance on many OSes 2. repack takes too long to run very regularly; it's an occasional command. 3. the perception that git repositories are not maintenance free. What I'm aiming for is something which is light enough that it might even win back the performance loss you got from 1), and to solve the perception problem of 3). Much as users who don't like automatic database maintenance turn it off and run it at the best time, advanced git users will want to disable this feature in ~/.gitrc and run repack themselves when it suits them, or via cron or whatever. Or it's disabled by default and users that whine get told to turn it on, it really doesn't matter. I can already do it with a commit hook, so I'm quite happy. > The recent LRU preferred pack patch > reduces much of the value in keeping a repository tidy > ("tidy" == "few pack files"). Great, that is a good thing. Pack files are an almost indistinguishable concept from database partitions. In terms of that, scaling problems with lots of partitions can be managed, certainly. For instance with database partitioning you would expect your query planner (in this case, read_packed_sha1()) to be able to select the right partition (pack) to go to first to avoid excessive index lookups. That a strategy for picking the best pack quickly N% of the time exists for git is an excellent measure to reduce the impact of a large number of pack files. I think you would probably find measurable wins by ensuring that the gross number of packs is kept limited. Consider that I'm thinking of running this generational repack somewhere such as a commit hook, if it found >100 loose objects, so that the first generation repack is very quick and doesn't annoy me - and the second generation will similarly be fairly quick as many deltas will already be computed. The exact behaviour will probably require tuning to get a good balance between good delta computation and minimal interruption to commit flow. Someone on IRC floated the idea of making the first generation do no delta computation to make it lightning fast. Note that if you had 3 pack generations, only the first two levels will ever be repacked - you'll end up with an unlimited number of third generation packs, which will also end up in LRP* order. > Already git-gc calls git-repack -a -d. How do you plan to change this? > I wonder if you should be making git-gc more intelligent instead. > > Also, you introduce a new pack properties file (.gen) which seems > awkward to me. This implementation is a simple demonstration of the logic which was designed to communicate the idea and stimulate discussion. I think the logic could probably go elsewhere too, and yes the new file is a bit of a hack. It might be better to base the "generation" assessment of the file on the actual size of the pack, for instance - ie, Instead of the number of loose objects, the size of the loose objects, call 1st generation = <1MB pack, 2nd generation = <5MB, etc. When the combined size of 1st generation packs gets above 5MB then that generation is full and a new 2nd generation pack is made. Then no state file is required. > Perhaps something like this would be useful on a huge repository > under active use. But delta re-use makes full repacking quite quick for > a reasonably-sized repository already, and I don't see this being very useful > for a repository which is large due to large objects. I agree with your point of view, however I think if the feature is out there but disabled by default then this can be found through experience. As you can see all of the elements to implement it are already there - and as you mention, combining packs is already quick. Sam. * Last Recently Packed ;) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html