Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Fri, 12 Jul 2013 12:30:07 +0200

On 07/12/2013 11:22 AM, Jeff King wrote:
> Yet another option is to consider what the check is doing, and
> accomplish the same thing in a different way. The real pain is that we
> are individually trying to resolve each object by hitting the filesystem
> (and doing lots of silly checks on the refname format, when we know it
> must be valid).
> 
> We don't actually care in this case if the ref list is up to date (we
> are not trying to update or read a ref, but only know if it exists, and
> raciness is OK). IOW, could we replace the dwim_ref call for the warning
> with something that directly queries the ref cache?

I think it would be quite practical to add an API something like

    struct ref_snapshot *get_ref_snapshot(const char *prefix)
    void release_ref_snapshot(struct ref_snapshot *)
    int lookup_ref(struct ref_snapshot *, const char *refname,
                   unsigned char *sha1, int *flags)

where prefix is the part of the refs tree that you want included in the
snapshot (e.g., "refs/heads") and ref_snapshot is probably opaque
outside of the refs module.

Symbolic refs, which are currently not stored in the ref_cache, would
have to be added because otherwise we would have to do all of the
lookups anyway.

I think this would be a good step to take for many reasons, including
because it would be another useful step in the direction of ref
transactions.

But with particular respect to "git cat-file", I see problems:

1. get_ref_snapshot() would have to read all loose and packed refs
within the specified subtree, because loose refs have to be read before
packed refs.  So the call would be expensive if there are a lot of loose
refs.  And DWIM wouldn't know in advance where the references might be,
so it would have to set prefix="".  If many refs are looked up, then it
would presumably be worth it.  But if only a couple of lookups are done
and there are a lot of loose refs, then using a cache would probably
slow things down.

The slowdown could be ameliorated by adding some more intelligence, for
example only populating the loose refs cache after a certain number of
lookups have already been done.

2. A "git cat-file --batch" process can be long-lived.  What guarantees
would users expect regarding its lookup results?  Currently, its ref
lookups reflect the state of the repo at the moment the commit
identifier is written into the pipe.  Using a cache like this would mean
that ref lookups would always reflect the snapshot taken at the start of
the "git cat-file" run, regardless of whether the script using it might
have added or modified some references since then.  I think this would
have to be considered a regression.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html