Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail

Jens Lehmann <Jens.Lehmann@xxxxxx> · Mon, 16 Nov 2015 22:42:09 +0100

Am 16.11.2015 um 20:25 schrieb Stefan Beller:
On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@xxxxxx> wrote:
Am 14.11.2015 um 01:10 schrieb Stefan Beller:
Thanks for pointing out that we already have some kind of server support.

I wonder if we should add an additional way to make fetching only some
sha1s possible. ("I don't want users to fetch any sha1, but only those
where superprojects point{ed} to", even if you force push a superproject,
you want to want to only allow fetching all sha1s which exist in the
current
superprojects branch.)

Me thinks the restrictions for sha1-fetching could come from the branches
these sha1s are found in the upstream submodule: if the client is allowed
to fetch a branch, it should be able to fetch any sha1 on that branch.

I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
set, is not sufficient though.

To fetch an arbitrary sha1, you would need to check if that sha1 is part
of the history of any advertised branch and then allow fetching serverside,
which sounds like some work for the server, which we may want to avoid
by having smarter data structures there.

Instead of having to search all branches for the requested sha1, we could have
some sort of data structure to make it not an O(n) operation (n being
all objects
in the repo).

Maybe I overestimate the work which needs to be done, because the server has
bitmaps nowadays.

Maybe a lazy reverse-pointer graph can be established on the serverside.
So I guess when we add the feature to fetch arbitrary sha1s, reachable from
any branch, people using submodules will make use of the feature. (such as with
git fetch --recurse --depth 1 or via a new `git fetch --recursive
--up-to-submodule-tip-only`)

So once the server is asked for a certain sha1, it will do the
reachability check,
which takes some effort, but then stores the result in the form:
"If ${current tip sha} of ${branch} is reachable, so is requested $sha1."

So when the next fetch request for $sha1 arrives, the server only needs to
check for ${current tip sha} to be part of $branch, which is expected to be
a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap could
just tell you or at least shorten the walk even more)
If the ${branch} has changed, the next evaluation for $sha1 can update
the cache,
such that the reverse lookup is not expensive on expectation.

Makes sense, although I do not know enough about the server side to tell if
it would need such an optimization or will cope with the load just fine.

But even if we'd enable such a feature without having to set an extra config
option, a submodule fetch asking for certain sha1s would have to fall back
to a simple "fetch all" like we do now when the server doesn't support that
for backwards compatibility. But maybe that's just obvious.

I assume this will mostly be used with submodules, so only a few sha1s need
this caching.

I won't bet on that, some of the submodules at $DAYJOB are rather busy and
see almost the same traffic as their superprojects ;-)

Maybe our emails crossed, but in the other mail I pointed out we could use
some sort of hidden ref (refs/superprojects/*) for that, which are
allowed to mark
any sort of sha1, which are allowed in the superproject/submodule context
to be fetched.

So whenever you push to a superproject (a project that has a gitlink),
we would need to check serverside if that submodule is at us and mark the
correct sha1s in the submodule. Then you can disallow fetching most of the
sha1s
but still could have a correctly working submodule update mechanism.

And what happens if the submodule isn't at us? Involving the serverside of
a superproject in submodule fetching sounds wrong to me. Me thinks that
the upstream of the submodule should always control if a sha1 is allowed
to be fetched. Or did I understand you wrong?

Yes and no.
The serverside submodule repository should be responsible for the ultimate
decision if you are allowed to fetch that sha1. But maybe on pushing the
superproject, we can store a hint in the submodule, that this sha1 is legit.
Although I may be missguided in my thinking here as the superproject
should have no influence on the submodule.

Submodules should never be aware of their superproject. But a superproject
does know its submodules, so I don't think the influence you describe here
is a problem per se. It's just looking like a corner case to me, as in a
lot of scenarios submodules do not live on the same server. And even if
they do, a superproject has no canonical way of finding their submodule's
repos (except for submodules that use relative URLs). So I'd rather like
to see a generic solution first, before we think about adding an optimized
version for certain setups later ;-)

The only real itch I have with the "superproject declaring submodule sha1s
fetchable on the server" approach is that it smells like a security problem.
The access rights of superprojects are often different from those of the
submodules it contains and this feels like a privilege escalation waiting
to happen.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html