[PATCH 2/2] scalar: convert README.md into a technical design doc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Victoria Dye <vdye@xxxxxxxxxx>

Replace 'README.md' with 'technical/scalar.txt' (still in 'contrib/'). In
addition to reformatting for asciidoc, elaborate on the background, purpose,
and design choices that went into Scalar.

This document is intended to persist in the 'Documentation/technical/'
directory after Scalar has been moved into the root of Git (out of
'contrib/'). Before then, it will also contain a "Roadmap" section detailing
the remaining series needed to finish the initial version of Scalar. The
section will be removed once Scalar is moved to the repo root, but in the
meantime serves as a guide for readers to keep up with progress on the
feature.

Signed-off-by: Victoria Dye <vdye@xxxxxxxxxx>
---
 contrib/scalar/README.md            |  82 ------------------
 contrib/scalar/technical/scalar.txt | 127 ++++++++++++++++++++++++++++
 2 files changed, 127 insertions(+), 82 deletions(-)
 delete mode 100644 contrib/scalar/README.md
 create mode 100644 contrib/scalar/technical/scalar.txt

diff --git a/contrib/scalar/README.md b/contrib/scalar/README.md
deleted file mode 100644
index 634b5771ed3..00000000000
--- a/contrib/scalar/README.md
+++ /dev/null
@@ -1,82 +0,0 @@
-# Scalar - an opinionated repository management tool
-
-Scalar is an add-on to Git that helps users take advantage of advanced
-performance features in Git. Originally implemented in C# using .NET Core,
-based on the learnings from the VFS for Git project, most of the techniques
-developed by the Scalar project have been integrated into core Git already:
-
-* partial clone,
-* commit graphs,
-* multi-pack index,
-* sparse checkout (cone mode),
-* scheduled background maintenance,
-* etc
-
-This directory contains the remaining parts of Scalar that are not (yet) in
-core Git.
-
-## Roadmap
-
-The idea is to populate this directory via incremental patch series and
-eventually move to a top-level directory next to `gitk-git/` and to `git-gui/`. The
-current plan involves the following patch series:
-
-- `scalar-the-beginning`: The initial patch series which sets up
-  `contrib/scalar/` and populates it with a minimal `scalar` command that
-  demonstrates the fundamental ideas.
-
-- `scalar-c-and-C`: The `scalar` command learns about two options that can be
-  specified before the command, `-c <key>=<value>` and `-C <directory>`.
-
-- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
-
-- `scalar-and-builtin-fsmonitor`: The built-in FSMonitor is enabled in `scalar
-  register` and in `scalar clone`, for an enormous performance boost when
-  working in large worktrees. This patch series necessarily depends on Jeff
-  Hostetler's FSMonitor patch series to be integrated into Git.
-
-- `scalar-gentler-config-locking`: Scalar enlistments are registered in the
-  user's Git config. This usually does not represent any problem because it is
-  rare for a user to register an enlistment. However, in Scalar's functional
-  tests, Scalar enlistments are created galore, and in parallel, which can lead
-  to lock contention. This patch series works around that problem by re-trying
-  to lock the config file in a gentle fashion.
-
-- `scalar-extra-docs`: Add some extensive documentation that has been written
-  in the original Scalar project (all subject to discussion, of course).
-
-- `optionally-install-scalar`: Now that Scalar is feature (and documentation)
-  complete and is verified in CI builds, let's offer to install it.
-
-- `move-scalar-to-toplevel`: Now that Scalar is complete, let's move it next to
-  `gitk-git/` and to `git-gui/`, making it a top-level command.
-
-The following two patch series exist in Microsoft's fork of Git and are
-publicly available. There is no current plan to upstream them, not because I
-want to withhold these patches, but because I don't think the Git community is
-interested in these patches.
-
-There are some interesting ideas there, but the implementation is too specific
-to Azure Repos and/or VFS for Git to be of much help in general (and also: my
-colleagues tried to upstream some patches already and the enthusiasm for
-integrating things related to Azure Repos and VFS for Git can be summarized in
-very, very few words).
-
-These still exist mainly because the GVFS protocol is what Azure Repos has
-instead of partial clone, while Git is focused on improving partial clone:
-
-- `scalar-with-gvfs`: The primary purpose of this patch series is to support
-  existing Scalar users whose repositories are hosted in Azure Repos (which
-  does not support Git's partial clones, but supports its predecessor, the GVFS
-  protocol, which is used by Scalar to emulate the partial clone).
-
-  Since the GVFS protocol will never be supported by core Git, this patch
-  series will remain in Microsoft's fork of Git.
-
-- `run-scalar-functional-tests`: The Scalar project developed a quite
-  comprehensive set of integration tests (or, "Functional Tests"). They are the
-  sole remaining part of the original C#-based Scalar project, and this patch
-  adds a GitHub workflow that runs them all.
-
-  Since the tests partially depend on features that are only provided in the
-  `scalar-with-gvfs` patch series, this patch cannot be upstreamed.
diff --git a/contrib/scalar/technical/scalar.txt b/contrib/scalar/technical/scalar.txt
new file mode 100644
index 00000000000..d785a5c036a
--- /dev/null
+++ b/contrib/scalar/technical/scalar.txt
@@ -0,0 +1,127 @@
+Scalar
+======
+
+Scalar is a built-in repository management tool that optimizes Git for use in
+large repositories. It accomplishes this by helping users to take advantage of
+advanced performance features in Git. Unlike most other Git built-in commands,
+Scalar is not executed as a subcommand of 'git'; rather, it is built as a
+separate executable containing its own series of subcommands.
+
+Background
+----------
+
+Scalar was originally designed as an add-on to Git and implemented as a .NET
+Core application. It was created based on the learnings from the VFS for Git
+project (another application aimed at improving the experience of working with
+large repositories). As part of its initial implementation, Scalar relied on
+custom features in the Microsoft fork of Git that have since been integrated
+into core Git:
+
+* partial clone,
+* commit graphs,
+* multi-pack index,
+* sparse checkout (cone mode),
+* scheduled background maintenance,
+* etc
+
+With the requisite Git functionality in place and a desire to bring the benefits
+of Scalar to the larger Git community, the Scalar application itself was ported
+from C# to C and integrated upstream.
+
+Features
+--------
+
+Scalar is comprised of two major pieces of functionality: automatically
+configuring built-in Git performance features and managing repository
+enlistments.
+
+The Git performance features configured by Scalar (see "Background" for
+examples) confer substantial performance benefits to large repositories, but are
+either too experimental to enable for all of Git yet, or only benefit large
+repositories. As new features are introduced, Scalar should be updated
+accordingly to incorporate them. This will prevent the tool from becoming stale
+while also providing a path for more easily bringing features to the appropriate
+users.
+
+Enlistments are how Scalar knows which repositories on a user's system should
+utilize Scalar-configured features. This allows it to update performance
+settings when new ones are added to the tool, as well as centrally manage
+repository maintenance. The enlistment structure - a root directory with a
+`src/` subdirectory containing the cloned repository itself - is designed to
+encourage users to route build outputs outside of the repository to avoid the
+performance-limiting overhead of ignoring those files in Git.
+
+Design
+------
+
+Scalar is implemented in C and interacts with Git via a mix of child process
+invocations of Git and direct usage of `libgit.a`. Internally, it is structured
+much like other built-ins with subcommands (e.g., `git stash`), containing a
+`cmd_<subcommand>()` function for each subcommand, routed through a `cmd_main()`
+function. Most options are unique to each subcommand, with `scalar` respecting
+some "global" `git` options (e.g., `-c` and `-C`).
+
+Because `scalar` is not invoked as a Git subcommand (like `git scalar`), it is
+built and installed as its own executable in the `bin/` directory, alongside
+`git`, `git-gui`, etc.
+
+Roadmap
+-------
+
+NOTE: this section will be removed once the remaining tasks outlined in this
+roadmap are complete.
+
+Scalar is a large enough project that it is being upstreamed incrementally,
+living in `contrib/` until it is feature-complete. So far, the following patch
+series have been accepted:
+
+- `scalar-the-beginning`: The initial patch series which sets up
+  `contrib/scalar/` and populates it with a minimal `scalar` command that
+  demonstrates the fundamental ideas.
+
+- `scalar-c-and-C`: The `scalar` command learns about two options that can be
+  specified before the command, `-c <key>=<value>` and `-C <directory>`.
+
+- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
+
+Roughly speaking (and subject to change), the following series are needed to
+"finish" this initial version of Scalar:
+
+- Finish Scalar features: Enable the built-in FSMonitor in Scalar
+  enlistments and implement `scalar help`. At the end of this series, Scalar
+  should be feature-complete from the perspective of a user.
+
+- Generalize features not specific to Scalar: In the spirit of
+  making Scalar configure only what is needed for large repo performance, move
+  common utilities into other parts of Git. Some of this will be internal-only,
+  but one major change will be generalizing `scalar diagnose` for use with any
+  Git repository.
+
+- Move Scalar to toplevel: Make `scalar` a built-in component of Git by
+  moving it out of `contrib/` and into the root of `git`. The actual change will
+  be relatively small, but this series will also contain expanded testing to
+  ensure Scalar is stable and performant.
+
+Finally, there are two additional patch series that exist in Microsoft's fork of
+Git, but there is no current plan to upstream them. There are some interesting
+ideas there, but the implementation is too specific to Azure Repos and/or VFS
+for Git to be of much help in general.
+
+These still exist mainly because the GVFS protocol is what Azure Repos has
+instead of partial clone, while Git is focused on improving partial clone:
+
+- `scalar-with-gvfs`: The primary purpose of this patch series is to support
+  existing Scalar users whose repositories are hosted in Azure Repos (which does
+  not support Git's partial clones, but supports its predecessor, the GVFS
+  protocol, which is used by Scalar to emulate the partial clone).
+
+  Since the GVFS protocol will never be supported by core Git, this patch series
+  will remain in Microsoft's fork of Git.
+
+- `run-scalar-functional-tests`: The Scalar project developed a quite
+  comprehensive set of integration tests (or, "Functional Tests"). They are the
+  sole remaining part of the original C#-based Scalar project, and this patch
+  adds a GitHub workflow that runs them all.
+
+  Since the tests partially depend on features that are only provided in the
+  `scalar-with-gvfs` patch series, this patch cannot be upstreamed.
-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux