Re: [PATCH] Allow hideRefs to match refs outside the namespace

Junio C Hamano <gitster@xxxxxxxxx> · Sat, 31 Oct 2015 10:31:23 -0700

Lukas Fleischer <lfleischer@xxxxxxx> writes:

>> If somebody is using namespaces and has "refs/frotz/" in the
>> hiderefs configuration, we hide refs/frotz/ no matter which
>> namespace is being accessed.  With this change, with the removal the
>> check from show_ref(), wouldn't such a repository suddenly see a
>> behaviour change?
>> [...]
>
> It would indeed. However, we cannot stay 100% backwards compatible when
> adding support for matching refs outside the current namespace without
> introducing new syntax. For example, if Git namespaces are in use (i.e.
> GIT_NAMESPACE is set), "refs/namespaces/foo/refs/bar" in hideRefs would
> not have hidden refs/namespaces/foo/refs/bar before the change but it
> does afterwards. You might argue that nobody would have added
> "refs/namespaces/foo/refs/bar" to hideRefs in the first place...

I won't.  To the current users, when they say they want to exclude
"refs/foo", they mean they do not want to advertise the fact that a
ref "refs/foo/*" exists in their repository (either the whole thing
if that is how it is accessed, or in the namespace being
accessed). and you can replace "foo" with any string, including the
ones that contain "/namespaces/", i.e. the user wanted to exclude
refs from nested ones.

I suspect what you wrote in the above is being a bit too defeatist,
though.  We only need to prevent regressions to user with existing
and valid configurations.

You earlier (re)discovered a good approach to introduce a new
feature without breaking settings of existing users when we
discussed a "whitelist".  Since setting the configuration to an
empty string did not do anything in the old code, an empty string
was an invalid and non-working setting.  By taking advantage of that
fact, you safely can say "if you start with an empty that would
match everything, we'll treat all the others differently from the
way we did before" if you wanted to.  I think you can follow the
same principle here.  For example, I can imagine that the rule for
the "ref-is-hidden" can be updated to:

 * it now takes refname and also the fullname before stripping the
   namespace;

 * hide patterns that is prefixed with '!' means negative, just as
   before;

 * (after possibly '!' is stripped), hide patterns that is prefixed
   with '^', which was invalid before, means check the fullname with
   namespace prefix, which is a new rule;

 * otherwise, check the refname after stripping the namespace.

Such an update would allow a new feature "we now allow you to write
a pattern that determines the match before stripping the namespace
prefix" without breaking the existing repositories, no?

After looking at the current code, I have to say that the way
ref-is-hidden and show_ref_cb() interact with each other is not very
well designed when namespaces are in use.  I suspect that this is
because the "namespace" stuff was bolted on to the system without
thinking things through.  For example, people may want to hide
refs/changes/* and with the current code, refs/changes/* from your
own namespace will be filtered out, but the corresponding hierarchy
from other namespaces will be included after getting turned into
".have".  And that cannot be a useful behaviour.  Tips of
refs/changes/* would be closely related to the corresponding
branches, which means that it would help reducing the object
transfer if they are included, and the fact that the user hides them
is that the user values it more to reduce the size of the initial
ref advertisement more than the potential reduction of the object
transfer cost.  If other pseudo repositories (aka namespaces) are
projects totally unrelated to yours, inluding their refs/changes/*
(or any of their refs, for that matter) would not help the later
object transfer cost, and including them in the initial ref
advertisement would not achieve anything.  Even if other namespaces
are projects that are closely related to yours, if you are excluding
refs/changes/* from your own, that is a strong sign that you do not
want their refs/changes/*, either.

Assuming other namespaces are forks of the same project as yours
(and otherwise the repository management strategy needs to be
rethought--using namespace for them is not gaining anything other
than making your repack more costly), it is likely that all of them
share a lot of refs that point at the same object (think "tags").
Do we end up sending a lot of ".have" for exactly the same object
number of times?  Even though we cannot dedup show_ref() lines that
talk about concrete refs (because they talk about what refs exist at
which value, and the sending side would use them to locally reject
non-ff pushes, for example), ".have" lines that talk about the same
object can be safely deduped.  This is not directly related to your
topic of "what should be included in the advertisement", but a
potentially good thing to fix, if it indeed turns out that we are
sending a lot of duplicate ".have"s.  A fix in that would make
things better for everybody (not just namespace users, but those who
show the ".have" lines from the refs in the repository we borrow
objects from).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html