Re: [PATCH 00/15] [RFC] Upstreaming the Scalar command

Derrick Stolee <stolee@xxxxxxxxx> · Mon, 30 Aug 2021 20:51:32 -0400

On 8/30/21 5:34 PM, Johannes Schindelin via GitGitGadget wrote:
> tl;dr: This series contributes the Scalar command to the Git project. This
> command provides an opinionated way to create and configure repositories
> with a focus on very large repositories.

I want to give Johannes a big thanks for organizing this RFC. As you
can see from the authorship of the patches, this was an amazingly
collaborative effort, but Johannes led the way by creating a base that
the rest of us could work with, then finally he brought in all of the
gritty details to finish the effort.

> Background
> ==========

...

> The Scalar project
> was created to make that separation, refine the key concepts, and then
> extract those features into the new Scalar command.

When people have asked me how Scalar fits with the core Git client, I
point them to our "Philosophy of Scalar" document [1]. The most concise
summary of our goals since starting Scalar has been that Scalar aligns
with features already within Git that enable scale. I've said several
times that we are constantly making Scalar do less by making Git do more.

[1] https://github.com/microsoft/git/blob/HEAD/contrib/scalar/docs/philosophy.md

Here is an example: when our large, internal customer told us that they
required Linux support for Scalar, we looked at what it would take. We
could have done the necessary platform-specific things to convince .NET
Core to create a long-running process that launched Git maintenance tasks
at different intervals, creating a similar mechanism to the Windows and
macOS services that did those operations. But we also knew that the
existing system was stuck with architectural decisions from VFS for Git
that were not actually in service of how Scalar worked. Instead, we
decided to build background maintenance into Git itself and had our Linux
port of Scalar run "git maintenance start".

Once the Linux port was proven out with Git's background maintenance, we
realized that the window where a user actually interacts with Scalar instead
of Git is extremely narrow: users run "scalar clone" or "scalar register"
and otherwise only run Git commands. The Scalar process does not need to
exist outside of that. (There are some other helpers that can be used in
a pinch to diagnose and fix problems, but they are rarely used. These
commands, such as 'scalar diagnose' can be contributed separately.)

It became clear that for our own needs it would be easier to ship one
installer that included the microsoft/git fork and the Scalar CLI, and
it would be simple to rewrite the Scalar CLI with all of the Git helper
APIs. We organized the code in a way that we thought would be amenable
to an upstream contribution (by placing in contrib/ and using Git code
style).

The thing about these commands is that they are _opinionated_. We rely
on these opinions for important internal users, but we realize that they
are not necessarily optimal for all users. Hence, we did not think it
wise to push those opinions onto the 'git' executable. Having 'scalar'
continue to live as a separate executable made sense to us.

I believe that by contributing Scalar to the full community, that we
create opportunities for Git in the future. For one, users and Git
distributors can opt into compiling Scalar so it is more available
to users who are interested. Another hopeful idea is that maybe this
reinvigorates ideas of how to streamline Git clones for large repos
without users needing to learn each and every knob to twist to get
things working. Since the Scalar CLI is contributed in the full
license of the Git project, pieces of it can be adapted into Git
proper as needed.

I look forward to hearing your thoughts.

Thanks,
-Stolee