Re: About GIT Internals

Konstantin Khomoutov <kostix@xxxxxxxx> · Mon, 6 Jun 2022 14:52:15 +0300

On Sat, Jun 04, 2022 at 08:54:10PM +0530, Aman wrote:

[...]
> > If you do also understand the latter - that is, understanding that Git is an
> > assortment of CLI tools combined into two layers called "plumbing" and
> > "porcelain", - then you should have no difficulty starting to read the code:
> > basically locate the source code of the entry point Git binary (which is,
> > well, "git", or "git.exe" on Windows) and start reading it.

(I have reversed the order of your questions below so that my comments follow
logically one after another.)

> What do you mean by the "entry point" of the git binary?

Well, porcelain Git commands (those supposed to be used by users to carry out
their day-to-day tasks) are all implemented as subcommands of a single
executable image file called "git" on all supported platforms (except Windows,
where it's called "git.exe"): for instance, you run "git init" to initialize a
repository, and your OS looks up the executable image file named "git"
somewhere in the list of directories containing such files (it's usually
contained in the environment variable named "PATH"), executes it and passes it
a single command-line argument - "init". The rest of the commands works the
same way. Therefore, that binary named "git" is an entry point of the Git
software package: the execution of most Git commands starts there (not *all*
Git commands, but let's not touch this yet).

> How do I do that?

Well, basically that's out of the scope of this list, but let's try...

Git is a complex software package mostly written in C (and POSIX shell).
As many F/OSS projects written in C, it has a top-level Makefile which is a
file supposed to be processed by GNU Make; this file contains a set of rules
for generating files from other files (compiling C source code into object
files and linking those into libraries and executable image files is exactly
this - generating files from other files). So usually you start from reading
the Makefile to find where the binary file of interest is generated, and from
which source files.

The problem is that Git's Makefile is *complex.*
So let's save you some headache and cut straight to the point: of the top
interest to you are the two files: git.c and common-main.c. The former is
exactly what implements that top-level entry point program, "git", while the
latter implements the function called "main" which is an entry point to any
program written in C which is supposed to be runnable standalone (as opposed
to becoming a library); the object file generated when compiling common-main.c
is linked to every other compiled code implementing Git commands, its main()
calls cmd_main() which is supposed to be implemented in the code of those
commands.

The rest is basically just usual C stuff - source files and header files.
If you're not familiar with these basics, then, I'm afraid, Git may be not the
best project to dive into.

In any case, I find the idea proposed by Junio elsewhere in this thread to be
very smart: it should be quite enlightening to read the "early" Git code to
make yourself accustomed to its overal architecture before moving on to its
present - much more complicated - implementation which nevertheless still
maintains the same architecture.