Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 16.11.2015 um 23:56 schrieb Stefan Beller:
On Mon, Nov 16, 2015 at 1:42 PM, Jens Lehmann <Jens.Lehmann@xxxxxx> wrote:
Am 16.11.2015 um 20:25 schrieb Stefan Beller:

On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@xxxxxx>
wrote:

Am 14.11.2015 um 01:10 schrieb Stefan Beller:

Thanks for pointing out that we already have some kind of server
support.

I wonder if we should add an additional way to make fetching only some
sha1s possible. ("I don't want users to fetch any sha1, but only those
where superprojects point{ed} to", even if you force push a
superproject,
you want to want to only allow fetching all sha1s which exist in the
current
superprojects branch.)



Me thinks the restrictions for sha1-fetching could come from the branches
these sha1s are found in the upstream submodule: if the client is allowed
to fetch a branch, it should be able to fetch any sha1 on that branch.


I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
set, is not sufficient though.

To fetch an arbitrary sha1, you would need to check if that sha1 is part
of the history of any advertised branch and then allow fetching
serverside,
which sounds like some work for the server, which we may want to avoid
by having smarter data structures there.

Instead of having to search all branches for the requested sha1, we could
have
some sort of data structure to make it not an O(n) operation (n being
all objects
in the repo).

Maybe I overestimate the work which needs to be done, because the server
has
bitmaps nowadays.

Maybe a lazy reverse-pointer graph can be established on the serverside.
So I guess when we add the feature to fetch arbitrary sha1s, reachable
from
any branch, people using submodules will make use of the feature. (such as
with
git fetch --recurse --depth 1 or via a new `git fetch --recursive
--up-to-submodule-tip-only`)

So once the server is asked for a certain sha1, it will do the
reachability check,
which takes some effort, but then stores the result in the form:
"If ${current tip sha} of ${branch} is reachable, so is requested $sha1."

So when the next fetch request for $sha1 arrives, the server only needs to
check for ${current tip sha} to be part of $branch, which is expected to
be
a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap
could
just tell you or at least shorten the walk even more)
If the ${branch} has changed, the next evaluation for $sha1 can update
the cache,
such that the reverse lookup is not expensive on expectation.


Makes sense, although I do not know enough about the server side to tell if
it would need such an optimization or will cope with the load just fine.

But even if we'd enable such a feature without having to set an extra config
option, a submodule fetch asking for certain sha1s would have to fall back
to a simple "fetch all" like we do now when the server doesn't support that
for backwards compatibility. But maybe that's just obvious.

It's not obvious to me.  Say you run the command:

     git clone --recursive --depth=1 ...

Currently the depth argument for the submodules is ignored, because it
doesn't work out conceptually. This is because recursive fetches fetch the
branch tips and not the submodule-specified sha1.

If we want to make it work, we would need to think about, what we want
to achieve here.
depth is usually used to reduce the transmit time/bandwidth required.

So if the server tells us it's not allowing fetching of arbitrary
sha1s by its cryptic message:

     $ git fetch origin 6f963a895a97d720c909fcf4eb0544a272ef7c49:refs/heads/copy
     error: no such remote ref 6f963a895a97d720c909fcf4eb0544a272ef7c49

we have two choices, either error out with

     die(_("Server doesn't support cloning of arbitrary sha1s"))

or we could pretend as if we know how to fix it by cloning regularly
with the whole history
attached and then present a tightened history by shallowing after
cloning. But that would
defeat the whole point of the depth argument in the first place, the
time and bandwidth would
have been wasted. So instead I'd rather have the user make the choice.

But for quite some time you'll have older servers out there that
don't support fetching a single sha1 or aren't configured to do so.
Wouldn't it be better to give the user an appropriate warning and
fall back to cloning everything for those submodules while using the
optimized new method for all others and the superproject? Otherwise
you won't be able to limit the depth if only a single submodule
server doesn't support fetching a single sha1.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]