Re: [PATCH 2/2] scalar: convert README.md into a technical design doc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/29/2022 12:58 PM, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@xxxxxxxxxx>
> 
> Replace 'README.md' with 'technical/scalar.txt' (still in 'contrib/'). In
> addition to reformatting for asciidoc, elaborate on the background, purpose,
> and design choices that went into Scalar.
> 
> This document is intended to persist in the 'Documentation/technical/'
> directory after Scalar has been moved into the root of Git (out of

I wonder if it is a good idea to move this into Documentation/technical/ _now_,
so we can have this document as part of "core Git's roadmap" as much as we can
say that. Part of the roadmap is moving the Scalar code out of contrib/ and
finalizing how users (or distributors) can install the 'scalar' binary.

It can be helpful to include the details of what steps to take to compile and
test the 'scalar' executable. That documentation will then be updated when
Scalar moves out of contrib/.

> +Scalar
> +======
> +
> +Scalar is a built-in repository management tool that optimizes Git for use in

I think "built-in" is unnecessary here. It makes me think of Git builtins and
otherwise does not add much to the description.

> +large repositories. It accomplishes this by helping users to take advantage of
> +advanced performance features in Git. Unlike most other Git built-in commands,
> +Scalar is not executed as a subcommand of 'git'; rather, it is built as a
> +separate executable containing its own series of subcommands.

Good to have this distinction.

> +Scalar is a large enough project that it is being upstreamed incrementally,
> +living in `contrib/` until it is feature-complete. So far, the following patch
> +series have been accepted:
> +
> +- `scalar-the-beginning`: The initial patch series which sets up
> +  `contrib/scalar/` and populates it with a minimal `scalar` command that
> +  demonstrates the fundamental ideas.
> +
> +- `scalar-c-and-C`: The `scalar` command learns about two options that can be
> +  specified before the command, `-c <key>=<value>` and `-C <directory>`.
> +
> +- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
> +
> +Roughly speaking (and subject to change), the following series are needed to
> +"finish" this initial version of Scalar:

I like how you have these two sections. As you move forward in this roadmap,
the diff to the document should look like you are _moving_ the sentence above.

> +- Finish Scalar features: Enable the built-in FSMonitor in Scalar
> +  enlistments and implement `scalar help`. At the end of this series, Scalar
> +  should be feature-complete from the perspective of a user.

Wow, we're so close!

> +- Generalize features not specific to Scalar: In the spirit of
> +  making Scalar configure only what is needed for large repo performance, move

(nit: Strange wrapping on the first line)

> +  common utilities into other parts of Git. Some of this will be internal-only,
> +  but one major change will be generalizing `scalar diagnose` for use with any
> +  Git repository.
> +
> +- Move Scalar to toplevel: Make `scalar` a built-in component of Git by

Here, I would say "included" instead of built-in. Or is there another term we
should use to avoid confusion with "git <cmd>" builtins.

> +  moving it out of `contrib/` and into the root of `git`. The actual change will
> +  be relatively small, but this series will also contain expanded testing to
> +  ensure Scalar is stable and performant.

Perhaps "This change will also integrate Scalar into Git's test suite and CI
checks."

You mention "performant" which makes me think that performance tests are intended
to be part of this change. It makes me think it would be interesting to have our
existing performance tests create a mode where they compare a "vanilla" Git repo
to one registered with Scalar, but otherwise runs the same tests already in the
t/perf/ test scripts. This is a wide aside so feel free to ignore me.

> +Finally, there are two additional patch series that exist in Microsoft's fork of
> +Git, but there is no current plan to upstream them. There are some interesting
> +ideas there, but the implementation is too specific to Azure Repos and/or VFS
> +for Git to be of much help in general.
> +
> +These still exist mainly because the GVFS protocol is what Azure Repos has
> +instead of partial clone, while Git is focused on improving partial clone:
> +
> +- `scalar-with-gvfs`: The primary purpose of this patch series is to support
> +  existing Scalar users whose repositories are hosted in Azure Repos (which does
> +  not support Git's partial clones, but supports its predecessor, the GVFS
> +  protocol, which is used by Scalar to emulate the partial clone).
> +
> +  Since the GVFS protocol will never be supported by core Git, this patch series
> +  will remain in Microsoft's fork of Git.
> +
> +- `run-scalar-functional-tests`: The Scalar project developed a quite
> +  comprehensive set of integration tests (or, "Functional Tests"). They are the
> +  sole remaining part of the original C#-based Scalar project, and this patch
> +  adds a GitHub workflow that runs them all.
> +
> +  Since the tests partially depend on features that are only provided in the
> +  `scalar-with-gvfs` patch series, this patch cannot be upstreamed.

We've had a good track record of identifying when an upstream change causes a
failure in the Scalar Functional Tests and then porting that test into the Git test
suite. I expect that to continue, but it has also not happened in a while. This is
probably because the test coverage now has a significant amount of overlap. Many
of the tests were created specifically for VFS for Git and strangeness around the
virtual filesystem layer. Several of those have found interesting corner cases in
Git's sparse-checkout feature, but these have probably been resolved by now.

Thanks for doing this work, Victoria. I'm excited to see how the project progresses
with your new plan.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux