On Fri, Dec 2, 2011 at 11:45 AM, Jeff King <peff@xxxxxxxx> wrote: > On Fri, Dec 02, 2011 at 09:35:52AM -0800, Junio C Hamano wrote: > >> Jeff King <peff@xxxxxxxx> writes: >> >> > When the objects become unreferenced, we eject them from the pack into >> > loose form again. If they don't become referenced in the 2-week window, >> > they get pruned then. So yes, you drop the age information, but they do >> > eventually go away. >> >> If you update gc/repack -A to put them in a separate pack, then you would >> never be able to get rid of them, no? You pack, then eject (which gives >> them a fresher timestamp), then notice that you are within the 2-week window >> and pack them again,... > > But we shouldn't be packing totally unreferenced objects. Barring bugs, > the life cycle of such an object should be something like: > > 1. Object X is created on branch 'foo'. > > 2. Branch 'foo' is deleted, but its commits are still in the HEAD > reflog, referencing X. > > 3. 90 days pass (actually, I think this might be the 30-day > expire-unreachable time) > > 4. "git gc" runs "git repack -Ad", which will eject X from the pack > into a loose form (because it is not becoming part of the new pack > we are writing). Actually, it is right here when the newly loosened unreferenced objects will be deleted. Objects ejected from a pack _are_ given the timestamp of the pack they were ejected from. So, if the pack is older than two weeks (90 days in your example), then so will be the loosened objects, and git prune will delete them when called by git gc. > 5. Two weeks pass. > > 6. "git gc" runs "git prune --expire=2.weeks.ago", which removes the > object. > > "gc" runs between (4) and (6) will not re-pack the object, because it > remains unreferenced. Correct with the recognition that loose objects get pack mtime, so step 5 may be less than two weeks. > I think things might be slowed somewhat by "gc --auto", which will not > do a "repack -A" until we have too many packs. So steps (3) and (4) are > really more like "gc runs git-repack without -A" 50 times, and then we > finally run "git repack -A". This is correct. This should have the effect of increasing the age of unreferenced objects when they are finally loosened and make it more likely that they are pruned during the same git gc operation that loosens them. Linus's scenario of fetching a lot of stuff that never actually makes it into the reflogs is still a valid problem. I'm not sure that people who don't know what they are doing are going to run into this problem though. Since he fetches a lot of stuff without ever checking it out or creating a branch from it, potentially many objects become unreferenced every time FETCH_HEAD changes. If he does this many times in a short period of time, he could reach the gc.autopacklimit and trigger gc --auto and produce more than gc.auto loose objects that are younger than gc.pruneExpire. Decreasing gc.pruneExpire as you suggested should make it much less likely to run into this problem. I wonder if it is worth trying to limit how often gc --auto is run to not be more often than gc.pruneExpire or something. If we modified the timestamp that is assigned to fetched packs, maybe we could use the pack timestamps as an indicator of how recently git gc has run. -Brandon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html