Re: [PATCH 0/3] log: create tighter default decoration filter

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 26 Jul 2022 20:19:40 +0200

On Tue, Jul 26 2022, Derrick Stolee wrote:

> On 7/26/2022 10:44 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Tue, Jul 26 2022, Derrick Stolee via GitGitGadget wrote:
>> 
>>> This was previously reduced by adding the log.excludeDecoration config
>>> option and modifying that config in git maintenance's prefetch task (to hide
>>> refs/prefetch/*). I then followed that pattern again for the bundle URI
>>> feature [1], but this caught some reviewers by surprise as an unfortunate
>>> side-effect. This series is a way to roll back the previous decision to use
>>> log.excludeDecoration and instead use tighter filters by default.
>>>
>>> As noted in the last patch, the current design ignores the new filters if
>>> there are any previously-specified filters. This includes the
>>> log.excludeDecorations=refs/prefetch/ set by the git maintenance command.
>>> This means that users who ran that command in their repo will not get the
>>> benefits of the more strict filters. While we stop writing
>>> log.excludeDecorations, we don't remove existing instances of it.
>> 
>> Leaving aside the question of these magic refs, and if we need new ones
>> (e.g. refs/bundle/*) I have sometimes made use of out-of-standard
>> refspace refs.
>> 
>> E.g. when I build git I create refs/built-tags/* tag object refs
>> (i.e. not in refs/tags/*), which is a neat way to get "git tag -l" and
>> the like to ignore it.
>> 
>> But to still have it show decorated in logs (e.g. I'll see what my
>> "private" branch is at), and "for-each-ref --contains" still knows about
>> it.
>
> You also have the unfortunate UX of having the refs spelled out entirely
> ("refs/<special-place>/..." instead of "<special-place>/..." like how
> "refs/remotes/" is dropped from remote refs) and not having special color.
> But that's beside the point.

Whether that's unfortunate UX or not is debatable, since we expose the
"refs/" prefix explicitly in other parts of the UX, and e.g. "HEAD" or
"ORIG_HEAD" or whatever in in that "/"-relation to
"refs/heads/whatever", and not "heads/whatever".

E.g. try:

	mkdir .git/FOO &&
	git rev-parse origin/seen >.git/FOO/BAR &&
	git push origin FOO/BAR:refs/FOO/BAR

And then:

	mkdir .git/refs/FOO &&
	git rev-parse origin/next >.git/refs/FOO/BAR
	$ git rev-parse FOO/BAR
	warning: object-name.c:970: refname 'FOO/BAR' is ambiguous.
	[seen SHA1]
	$ git rev-parse refs/FOO/BAR
	[next SHA1]

I.e. we really do understand FOO/BAR as meaning .git/FOO/BAR, as opposed
to .git/refs/FOO/BAR. You really can't assume that you can strip a
"refs/" from a multi-level ref and not end up with ambiguities. It's
*not* the case that we just have the .git/<UPPER-CASE-HERE> and
.git/refs/* namespaces, and nothing else.

Are we *almost* there? Yes, I vaguely recall experimenting with trying
to get us 100% of the way (just locally, didn't make it on-list) a long
time ago, and there were some tricky edge cases, and we don't know who
in the wild is relying on it.

I think it would be interesting to explore getting there, but I don't
think eliding information on the display end like that would be the
right way to start.

Although to be fair e.g. "git for-each-ref" excludes these entirely, but
that's a ref-filter.c specific edge-case, per the above try it on
e.g. "git push", you'll find that you can push one or the other.

I think I was poking at this at the time because I wanted to have "git
show head" mean the same as "git show HEAD", or at least for us to start
enforcing the rule that thou shalt not use refnames that are head, HeAd
or whatever mixed-case version we have of our magical all-upper special
refs. This matters because you have commands like "git rev-parse head"
that'll work on case-insensitive FS's that promptly fail on
case-sensitive FS's.

>> Now, that's a rather obscure use-case, and I suspect other "special
>> refs" are similarly obscure (e.g. GitLab's refs/keep-around/* comes to
>> mind).
>> 
>> But I think this change is going about it the wrong way, let's have a
>> list of refs that Git knows about as magical, instead of assuming that
>> we can ignore everything that's not on a small list of things we're
>> including.
>> 
>> Wouldn't that give you what you want, and not exclude these sorts of
>> custom refs unexpectedly for users?
>
> Instead of keeping track of an ever-growing list of exclusions, instead
> making a clear list of "this is what most users will want for their
> decorations" is a better approach.
>
> Users who know how to create custom refs outside of this space have the
> capability to figure out how to show their special refs. My general ideas
> for designing these kinds of features is to have a default that is focused
> on the typical user while giving config options for experts to tweak those
> defaults.
>
> You're right that this series perhaps leaves something to be desired in
> that second part, since there isn't an easy _config-based_ way to enable
> all decorations (or a small additional subset).

Yes, but this is just side-stepping the issue. Your X-Y problem is that
you want to exclude certain refs that we're specifically creating.

I think that's fair enough, but I don't see why we're not specifically
excluding just those then.

It'll just be these bundle refs, filter refs, or whatever other magic we
want excluded. Why would we extend that to refs that we don't know about?

It seems like a safe assumption that if we don't know what it is we
should include it.

Both because that's what we do now (least chance of breaking workflows
in the wild), and because the entire point of the feature is to display
the log in relation to points on the ref namespace. If we don't *know*
that they're meaningless let's leave them out of the exclusion, no?

I'd also think that we'd want to consider this holistically. I haven't
checked, but aside from HEAD (which I think is the only special-case,
but maybe I'm wrong) isn't there a one-to-one mapping between what we'll
show in decorations, and what "git for-each-ref" diplays?

>>> I'm interested if anyone has another way around this issue, or if we
>>> consider adding the default filter as long as no --decorate=refs options are
>>> specified.
>> 
>> I think the resulting UX here is bad, in that we ship hardcoded list of
>> these if you don't specify the config in 2/3. So I can do:
>> 
>>       -c log.excludeDecoration=this-will-never-match
>> 
>> To "clear" the list, but not this:
>> 
>>       -c log.excludeDecoration=
>
> The thing that I forgot to do, but had considered was adding a
> --decorate-all option to allow clearing the default filters from the
> command line. You can already do "--decorate-refs=refs" to get everything
> (except HEAD).

...and some other exclusions, e.g. it won't show ORIG_HEAD even with
--decorate-refs=ORIG_HEAD, but perhaps I'm doing it the wrong way...

> As far as config goes, we could also create a log.includeDecoration key,
> but we'd want to consider it to populate the same part of the filtering
> algorithm. Similar to having any instance of log.excludeDecoration, this
> would clear the default list. To get all decorations again, you could add
> this to your config file:
>
> 	[log]
> 		includeDecoration = refs
> 		includeDecoration = HEAD

If we're considering new config I'd think anything that worked like this
instead would be *much less* confusing:

	some.config = refs/*
	some.config = HEAD

I.e. per the above your HEAD will presumably either match HEAD/BLAH, or
treat "refs" magically as "refs/" and not "refs".

Now, I don't think you can delete .git/HEAD and/or make .git/refs a file
and still have a functioning repository, but the same general rule
applies to "FOO" v.s. "FOO/BAR" and "custom-refs" or whatever.

We use that wildcard syntax unambiguously for the refspec matching, we
could just use the same machinery to match this.

> Alternatively, we could instead create a "filter default" option such as
> "log.decorateFilter = (core|all)" where "core" is the default set being
> considered by this series, and "all" is the empty filter.

Sure, for any key that takes N arguments we could have some way to say
"if you do this it's the same as enumerating these X values", as long as
it's not ambiguous whether you mean a special "aggregate label" or a
name of your own...