Re: [PATCH v2 01/12] fast-export: do anonymize the primary branch name

Elijah Newren <newren@xxxxxxxxx> · Thu, 18 Jun 2020 00:13:53 -0700

Hi Junio,

On Wed, Jun 17, 2020 at 11:30 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Elijah Newren <newren@xxxxxxxxx> writes:
>
> >> That leaves us in the interim with a "fast-export --anonymize" that is a
> >> little harder to use (you have no way to know which branch was which).
> >
> > Why does fast-export special case on "master" rather than on HEAD?
> > Isn't it more relevant to know the active branch than what _might_
> > have been the initial branch?  It kind of feels like a bug to me that
> > HEAD isn't the special case construct.
>
> I am torn on that one.  Surely HEAD is often the branch that has our
> current attention.  It may well be what we are exporting and we may
> want to see the topology formed by other refs relative to it.
>
> On the other hand, the current branch may not necessarily be what we
> are exporting.  Historically a project has a single branch that is
> the focus of most users' attention when they talk about the general
> state of the project's progress, so it is understandable to expect
> that the topology may want to be seen relative to that one central
> line of development.

I'm trying to understand here, but I feel like I'm missing something.
Let me try to explain what I understand and hopefully you can figure
out what I'm not seeing...

Regardless of what is mainline and whether or not it is important,
users probably trigger their bug when a certain branch is checked out.
Their bug may also trigger on other branches, but it at least triggers
on one, and some bugs will only trigger on one branch.  It seems
logical to me that we would want to have the same branch checked out
(it's the one most likely to trigger the same issue), and thus
identifying the HEAD branch is generally important.  (Mainline may be
too, I'm merely asserting that HEAD is important at this point.)

If users trigger their bug by providing various revision
specifications on the command line that compare multiple branches or
something, then we're already in the situation of needing to know how
to map more than one reference to anonymized ones in order to be able
to replicate their issue.  However, knowing the mainline might not
even help in this case; we instead need to know the anonymized form of
the references they are using, whatever those are, and mainline is
only useful if it happens to be one of them.

So, I think HEAD is always useful.  Additional references would
sometimes be useful, but it's not clear to me that mainline is one of
those additional references.  Maybe I'm just being dense, and I
apologize if so, but under what circumstances does knowing the
mainline help with debugging a user issue where an anonymized
fast-export is provided?

> > (Speaking as someone whose company a number of years ago had most
> > their big repos and lots of little repos switch their main branch to
> > be named "develop", and in some of those repos deleted "master" but
> > didn't in others.  If I had needed some steps to reproduce a problem,
> > and hadn't been on the inside, any special casing from fast-export
> > would make more sense to me to apply to "develop" than to "master".)
>
> Yes, absolutely.  You either check "develop" out temporarily just to
> take anonymized export to make "develop" discoverable in the output,

That makes sense; if the bug triggers while they are on develop then
I'd expect them to be on develop when they export.  If it triggers on
some other branch, I'd expect them to stay on that other branch when
they export even if "develop" is the mainline.

> or you would have set core.primaryBranch to "develop" once sometime
> in the past to tell Git that "develop" is that special one, not
> "master", so you can take such an export from any branch.

This doesn't make sense to me.  The person who changed the primary
branch to "develop" for some repository did so years ago.  That
individual might not even still be at the company, and even if they
are, may well be working on a totally different project (and
repository) today.  Perhaps that individual set core.primaryBranch at
the time, but git-config settings aren't copied by fetch/clone/push,
so I don't see how this one helps at all.  We could tell all future
developers who clone any of these repositories that they also need to
set core.primaryBranch when they clone the repo, but that seems super
lame to me especially since the odds that any one of them will ever
need or benefit from it are approximately 0.  And yet, it'll be one of
these developers who joined the project long after the switchover who
runs into problems and provides fast-export dumps.

It feels like I'm probably just missing something obvious, but I
really don't see how the mainline is special here.  Please do point
out what I'm missing.

Thanks,
Elijah