On Tue, Jul 17 2018, Jeff King wrote: > On Tue, Jul 17, 2018 at 10:59:50AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> That doesn't make sense to me. Just because git itself is happily >> ignoring the exit code I don't think that should mean there shouldn't be >> a meaningful exit code. Why don't you just ignore the exit code in the >> repo tool as well? >> >> Now e.g. automation I have to see if git-gc ---auto is having issues >> can't just be 'git gc --auto || echo fail @ {hostname}' across a fleet >> of servers, but will need to start caring if stderr was emitted to. > > If you're daemonizing gc, though, there are a number of cases where the > exit code is not propagated. If you really care about the outcome, > you're probably better off either: In theory a lot of the other stuff can fail, but in practice I've only seen it get stuck due to running out of disk space (easily detected in other ways), and because of having too many loose objects (e.g. due to, but not limited to https://public-inbox.org/git/87fu6bmr0j.fsf@xxxxxxxxxxxxxxxxxxx/). > - doing synchronous gc's, which will still return a meaningful code > after Jonathan's patches (Reply to this and "we've found at GitHub..." later in your mail) I'm targeting clients (dev machines, laptops, staging boxes) which have clones of repos, some put in place by automation, some manually cloned by users in ~. So it's much easier to rely on shipping a /etc/gitconfig than setting gc.auto=0 and having some system-wide job that goes and hunts for local git repos to manually gc. It's also a big advantage that it's driven by user activity, because it's an implicit guard of only_do_this_if_the_user_is_active_here() before "git-gc" on a fleet of machines, which when some of those get their disk space from shared NFS resources cuts down on overall I/O. > - inspecting the log yourself. I know that comes close to the un-unixy > stderr thing. But in this case, the presence of a non-empty log is > literally the on-disk bit for "the previous run failed". And doing > so catches all of the daemonized cases, even the "first one" that > you'd miss by paying attention to the immediate exit code. > > This will treat the zero-exit-code "warning" case as an error. I'm > not against propagating the true original error code, if somebody > wants to work on it. But I think Jonathan's patches are a strict > improvement over the current situation, and a patch to propagate > could come on top. Yeah, I can check gc.log, if others are of a different opinion about this #3 case at the end of the day I don't mind if it returns 0. It just doesn't make any conceptual sense to me. >> I think you're conflating two things here in a way that (if I'm reading >> this correctly) produces a pretty bad regression for users. >> >> a) If we have something in the gc.log we keep yelling at users until we >> reach the gc.logExpiry, this was the subject of my thread back in >> January. This sucks, and should be fixed somehow. >> >> Maybe we should only emit the warning once, e.g. creating a >> .git/gc.log.wasemitted marker and as long as its ctime is later than >> the mtime on .git/gc.log we don't emit it again). >> >> But that also creates the UX problem that it's easy to miss this >> message in the middle of some huge "fetch" output. Perhaps we should >> just start emitting this as part of "git status" or something (or >> solve the root cause, as Peff notes...). > > I kind of like that "git status" suggestion. Of course many users run > "git status" more often than "git commit", so it may actually spam them > more! Maybe piggy-packing on the advice facility ... >> b) We *also* use this presence of a gc.log as a marker for "we failed >> too recently, let's not try again until after a day". >> >> The semantics of b) are very useful, and just because they're tied up >> with a) right now, let's not throw out b) just because we're trying to >> solve a). > > Yeah. In general my concern was the handling of (b), which I think this > last crop of patches is fine with. As far as showing the message > repeatedly or not, I don't have a strong opinion (it could even be > configurable, once it's split from the "marker" behavior). > >> Right now one thing we do right is we always try to look at the actual >> state of the repo with too_many_packs() and too_many_loose_objects(). >> >> We don't assume the state of your repo hasn't drastically changed >> recently. That means that we do the right thing and gc when the repo >> needs it, not just because we GC'd recently enough. >> >> With these proposed semantics we'd skip a needed GC (potentially for >> days, depending on the default) just because we recently ran one. > > Yeah, I agree that deferring repeated gc's based on time is going to run > into pathological corner cases. OTOH, what we've found at GitHub is that > "gc --auto" is quite insufficient for scheduling maintenance anyway > (e.g., if a machine gets pushes to 100 separate repositories in quick > succession, you probably want to queue and throttle any associated gc). I'm sure you have better solutions for this at GitHub, but as an aside it might be interesting to add some sort of gc flock + retry setting for this use-case, i.e. even if you had 100 concurrent gc's due to too_many_*(), they'd wait + retry until they could get the flock on a given file. Then again this is probably too specific, and could be done with a pre-auto-gc hook too.. > But there are probably more mid-size sites that are big enough to have > corner cases, but not so big that "gc --auto" is hopeless. ;) FWIW I don't deal with gc on the server at all these days (that's our internal GitLab's instance problem to solve). I just mentioned the edge case of a growing number of pack files on the server upthread as something that we'd be shipping with git if we had time-based backoff semantics.