Re: About GIT Internals

Aman <amanmatreja@xxxxxxxxx> · Fri, 3 Jun 2022 17:48:14 +0530

Hello everyone. I sent out an email here last week, asking for a list
of resources, so I could better understand the workings and design of
git. I really appreciate everyone, who gave the links and their
advice.

I have been reading about GIT for some time now, and have looked at
almost all of the resources plus some others. I think I could say, I
now have a decent conceptual understanding of how GIT  works
internally.

(Also, I understood the chapter about git I read in the book I am
reading, Architecture of Open Source Applications: Volume 2, which I
didn't understand at all, the reason I started this thread). Although
there must definitely be a lot of details and subtle things I may not
understand yet (like branches are nothing but pointers to commits,
wow! btw)

Now, continuing this discussion, and talking about the implementation
and engineering side of things, I wanted to ask another question and
hence wanted some advice.

Though I may understand the internal design and high-level
implementation of GIT, I really want to know how it's implemented and
was made, which means reading the SOURCE CODE.

1. I don't know how absurd of a quest this is, please enlighten me.
2. How do I do it? Where do I start? It's such a BIG repository - and
I am not guessing it's going to be easy.
3. Would someone advise, perhaps, to have a look at an older version
of the source code? rather than the latest one, for some reason.

Again, I would really appreciate it if someone could give their
thoughts on this.

Thank you,

Regards,
Aman

On Mon, May 30, 2022 at 7:40 PM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
>
> On Mon, May 30 2022, Konstantin Khomoutov wrote:
>
> > On Mon, May 30, 2022 at 09:49:57AM +0000, Kerry, Richard wrote:
> >
> > [...]
> >> > > 1. I haven't had the experience of working with other (perhaps even
> >> > > older) version control systems, like subversion. So when refering to
> >> > > the "control" aspect,
> >> >
> >> > The "control" aspect was from whoever was the 'manager' that limited
> >> > access to the version system (i.e. acting like a museum curator), and deciding
> >> > if your masterpiece was worthy of inclusion as a significant example of your
> >> > craft, whether that was an engineering drawing or some software code.
> >>
> >> I'm not sure I get that idea.  I worked using server-based Version Control
> >> systems from the mid 80s until about 5 years ago when the team moved from
> >> Subversion to Git.  There was never a "curator" who controlled what went
> >> into VC.  You did your work, developed files, and committed when you thought
> >> it necessary.  When a build was to be done there would then be some
> >> consideration of what from VC would go into the build. That is all still
> >> there nowadays using a distributed system (ie Git).  Those doing Open source
> >> work might operate a bit differently, as there is of necessity distribution
> >> of control of what gets into a release. But those of us who are developing
> >> proprietary software are still going through the same sort of release
> >> process.  And that's even if there isn't actually a separate person actively
> >> manipulating the contents of a release, it's just up to you to do what's
> >> necessary (actually there are others involved in dividing what will be in,
> >> but in our case they don't actively manipulate a repository).
> >
> > I think, the "inversion of control" brought in by DVCS-es about a bit
> > differet set of things.
>
> Re the "I'm not sure I get that idea" from Richard I think his point
> stands that some of the stories we carry around about the VCS v.s. DVCS
> in free/open source software was more particular to how things were done
> in those online communities, and not really about the implicit
> constraints of centralized VCS per-se.
>
> Partly those two mix: It was quite common for free software projects not
> to have any public VCS (usually CVS) access at all, some did, but it was
> quite a hassle to set up, and not part of your "normal" workflow (as
> opposed setting up a hoster git repository, which everyone uses) that
> many just didn't do it.
>
> > I would say it is connected to F/OSS and the way most projects have been
> > hosted before the DVCS-es over: usually each project had a single repository
> > (say, on Sourceforge or elsewhere), and it was "truly central" in the sense
> > that if anyone were to decide to work on that project, they would need to
> > contact whoever were in charge of that project and ask them to set up
> > permissions allowing commits - may be not to "the trunk", but anyway the
> > commit access was required because in centralized VCS commits are made on the
> > server side.
>
> We may have tried this in different eras, but from what I recall it was
> a crapshoot whether there was any public VCS access at all. Some
> projects were quite good about it, and sourceforge managed to push that
> to more of them early on by making anonymous CVS access something you
> could get by default.
>
> But a lot of projects simply didn't have it at all, you'll still find
> some of them today, i.e. various bits of "infrastructure" code that the
> maintainers are (presumably) still manually managing with zip snapshots
> and manually applied patches.
>
> > (Of course, there were projects where you could mail your patchset to a
> > maintainer, but maintaining such patchset was not convenient: you would either
> > need to host your own fully private VCS or use a tool like Quilt [1].
> > Also note that certain high-profile projects such as Linux and Git use mailing
> > lists for submission and review of patch series; this workflow coexists with
> > the concept of DVCS just fine.)
>
> I'd add though that this isn't really "co-existing" with DVSC so much as
> using patches on a ML as an indirect transport protocol for "git push".
>
> I.e. if you contributed to some similar projects "back in the day" you
> could expect to effectively send your patche into a black-hole until the
> next release, the maintainer would apply them locally, you wouldn't be
> able to pull them back down via the DVCS.
>
> Perhaps there would be development releases, but those could be weeks or
> even months apart, and a "real" release might be once every 1-2 years.
>
> Whereas both Junio and Linus (and other linux maintainers) publish their
> version of the patches they do integrate fairly quickly.
>
> > [...] it also has possible
> > downsides; one of a more visible is that when an original project becomes
> > dormant for some reason, its users might have hard time understanding which
> > one of competing forks to switch to, and there are cases when multiple
> > competing forks implement different features and bugfixes, in parallel.
> > One of the guys behind Subversion expressed his concerns about this back then
> > wgen Git was in its relative infancy [2].
> >
> >  1. https://en.wikipedia.org/wiki/Quilt_(software)
> >  2. http://blog.red-bean.com/sussman/?p=20
>
> It's interesting that this aspect of what proponents of centralized VCS
> were fearful of when it came to DVCS turned out to be the exact
> opposite:
>
>     Notice what this user is now able to do: he wants to to crawl off
>     into a cave, work for weeks on a complex feature by himself, then
>     present it as a polished result to the main codebase. And this is
>     exactly the sort of behavior that I think is bad for open source
>     communities.
>
> I.e. lowering the cost to publish early and often has had the effect
> that people are less likely to "crawl off into a cave" and work on
> something for a long time without syncing up with other parallel
> development.