On Wed, Jun 21, 2023 at 8:45 AM Jeff King <peff@xxxxxxxx> wrote: > > On Tue, Jun 20, 2023 at 09:12:24PM +0200, Tao Klerks wrote: > > > I'm back to begging for any hints here: Any idea how I can determine > > whether a given commit object exists locally, *without causing it to > > be fetched by the act of checking for it?* > > This is not very efficient, but: > > git cat-file --batch-check='%(objectname)' --batch-all-objects --unordered | > grep $some_sha1 > > will tell you whether we have the object locally. > Thanks so much for your help! in Windows (msys or git bash) this is still very slow in my repo with 6,500,000 local objects - around 60s - but in linux on the same repo it's quite a lot faster, at 5s. A large proportion of my users are on Windows though, so I don't think this will be "good enough" for my purposes, when I often need to check for the existence of dozens or even hundreds of commits. > I don't work with partial clones often, but it feels like being able to > say: > > git --no-partial-fetch cat-file ... > > would be a useful primitive to have. It feels that way to me, yes! On the other hand, I find very little demand for it when I search "the internet" - or I don't know how to search for it. > The implementation might start > something like this: > > diff --git a/object-file.c b/object-file.c > index 7c1af5c8db..494cdd7706 100644 > --- a/object-file.c > +++ b/object-file.c > @@ -1555,6 +1555,14 @@ void disable_obj_read_lock(void) > > int fetch_if_missing = 1; > > +static int allow_lazy_fetch(void) > +{ > + static int ret = -1; > + if (ret < 0) > + ret = git_env_bool("GIT_PARTIAL_FETCH", 1); > + return ret; > +} > + > static int do_oid_object_info_extended(struct repository *r, > const struct object_id *oid, > struct object_info *oi, unsigned flags) > @@ -1622,6 +1630,7 @@ static int do_oid_object_info_extended(struct repository *r, > > /* Check if it is a missing object */ > if (fetch_if_missing && repo_has_promisor_remote(r) && > + allow_lazy_fetch() && > !already_retried && > !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) { > promisor_remote_get_direct(r, real, 1); > > and then have git.c populate the environment variable, similar to how we > handle --literal-pathspecs, etc. > > That fetch_if_missing kind of does the same thing, but it's mostly > controlled by programs themselves which try to handle missing remote > objects specially. Thanks, I will play with this if I get the chance. That said, I don't control my users' distributions of Git, so on a purely practical basis I'm looking for something that will work in git 2.39 to whatever future version would introduce such a capability. (before 2.39, the "set remote to False" hack works) > It does seem like you might be able to bend it to > your will here, though. I think without any patches that: > > git rev-list --objects --exclude-promisor-objects $oid > > will tell you whether we have the object or not (since it turns off > fetch_if_missing, and thus will either succeed, printing nothing, or > bail if the object can't be found). This behaves in a way that I don't understand: In the repo that I'm working in, this command runs successfully *without fetching*, but it takes a *very* long time - 300+ seconds - much longer than even the "inefficient" 'cat-file'-based printing of all (6.5M) local object ids that you proposed above. I haven't attempted to understand what's going on in there (besides running with GIT_TRACE2_PERF, which showed nothing interesting), but the idea that git would have to work super-hard to find an object by its ID seems counter to everything I know about it. Would there be value in my trying to understand & reproduce this in a shareable repo, or is there already an explanation as to why this command could/should ever do non-trivial work, even in the largest partial repos? > It feels like --missing=error should > function similarly, but it seems to still lazy-fetch (I guess since it's > the default, the point is to just find truly unavailable objects). Using > --missing=print disables the lazy-fetch, but it seems to bail > immediately if you ask it about a missing object (I didn't dig, but my > guess is that --missing is mostly about objects we traverse, not the > initial tips). Woah, "--missing=print" seems to work!!! The following gives me the commit hash if I have it locally, and an error otherwise - consistently across linux and windows, git versions 2.41, 2.39, 2.38, and 2.36 - without fetching, and without crazy CPU-churning: git rev-list --missing=print -1 $oid Thank you thank you thank you! I feel like I should try to work something into the doc about this, but I'm not sure how to express this: "--missing=error is the default, but it doesn't actually error out when you're explicitly asking about a missing commit, it fetches it instead - but --missing=print actually *does* error out if you explicitly ask about a missing commit" seems like a strange thing to be saying. Thanks again for finding me an efficient working strategy here!