On Fri, Jan 29, 2021 at 03:01:48PM -0500, Taylor Blau wrote: > On Fri, Jan 29, 2021 at 02:19:25PM -0500, Jeff King wrote: > > The overall goal here is being able to roll up loose objects and smaller > > packs without having to pay the cost of a full reachability traversal > > (which can take several minutes on large repositories). Another > > very-different direction there is to just enumerate those objects > > without respect to reachability, stick them in a pack, and then delete > > the originals. That does imply something like "repack -k", though, and > > interacts weirdly with letting unreachable objects age out via their > > mtimes (we'd constantly suck them back into fresh packs). > > As I mentioned in an earlier response to Junio, this was the original > approach that I took when implementing this, but ultimately decided > against it because it means that we'll never let unreachable objects age > out (as you note). Right. But that's no different than using "-k" most of the time, and then occasionally doing a more careful repack with short expiration times and a full reachability check. As you know, this is basically what we do at GitHub. So it may be reasonable to go that direction, which is really defining a totally separate strategy from git-gc's "repack, and occasionally objects age out". Especially if we find that the assume-kept-packs-closed route is too risky (i.e., has too many cases where it's possible to cause corruption if our assumptions isn't met). I'm not convinced either way at this point, but just thinking out loud on the options (and trying to give some context to the list). > I wonder if we need our assumption that the union of kept packs is > closed under reachability to be specified as an option. If the option is > passed, then we stop the traversal as soon as we hit an object in the > frozen packs. If not passed, then we do a full traversal but pass > --honor-pack-keep to drop out objects in the frozen packs after the > fact. > > Thoughts? I'm confused. I thought the whole idea was to pass it as an option (the user telling Git "I know these packs are supposed to be closed; trust me")? > > I think it would want to be "the set of all .keep packs is closed". In a > > "roll all into one" scenario like above, there is only one .keep pack. > > But in a geometric progression, that single pack which constitutes your > > base set could be multiple packs (the last whole "git repack -ad", but > > then a sequence of roll-ups that came on top of it). > > I don't think having a roll-up strategy of "all-except-one" simplifies > things. Or, if it does, then I don't understand it. Isn't this the exact > same thing as a geometric repack which decides to keep only one pack? > > ISTM that you would be susceptible to the same problems in this case, > too. I wasn't trying to argue that all-except-one avoids any problems. I was saying that the example I gave above was an all-into-one, but if you want to extend the concept to multiple packs, it has to cover the whole set. I.e., answering Junio's: > is it OK for objects in one kept pack to refer to another object in > the other kept pack? with "yes". -Peff