Re: Interest in Git's Global State Reduction Project - UC Berkeley CS Student

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kelsey,

On Mon, Mar 10, 2025 at 03:17:21AM -0700, Kelsey Zhou wrote:
> Dear Patrick,

two asks from my side:

  - In the Git community we don't top-post. Instead, replies to an email
    go inline and relevant parts you want to quote go above your reply.

  - Please send your follow-up questions to the Git mailing list instead
    of only contacting me. This allows other mentors to chime in, as
    well.

I've put the Git mailing list back into Cc.

> Thank you for your detailed responses and insights into the global state
> refactoring project. I appreciate you taking the time to address my
> questions. I'll definitely follow your recommendation to explore the
> microprojects as a starting point to demonstrate my fit for the program.
> 
> From your explanations, I understand that while no single global variable
> presents a uniquely difficult challenge, the comprehensive nature of the
> refactoring and the volume of references (3,200+ for the_repository alone)
> make this a substantial undertaking. I'm particularly interested in the
> environment.c variables you mentioned, which require more thoughtful,
> case-by-case solutions.
> 
> A few follow-up questions I have:
> 1. For newcomers to the Git codebase, are there any specific microprojects
> you would recommend that might build relevant skills for this global state
> reduction effort?

I honestly wouldn't care about that too much yet. The microprojects are
designed so that you first get up to speed with contributing to the Git
project in the first place. So I'd rather focus on a microproject that
looks easy, and once you have finished that microproject you can look
for further projects that might already be in the vicinity of what you
want to do in the actual project.

> 2. Could you suggest some previously completed patches related to global
> state reduction that might serve as good examples to study?
> I'm excited about the potential architectural improvements you described,
> particularly the possibility of better parallelization and reduced process
> spawning. These align well with my interests in systems optimization. Thank
> you again for your guidance. I look forward to contributing to the Git
> project.

The patch series at [1] would be one such example.

Thanks!

Patrick

[1]: https://lore.kernel.org/git/20250303-b4-pks-objects-without-the-repository-v1-0-c5dd43f2476e@xxxxxx/

> Best regards,
> Kelsey Zhou
> 
> On Fri, Mar 7, 2025 at 12:51 AM Patrick Steinhardt <ps@xxxxxx> wrote:
> 
> > Hi Kelsey,
> >
> > On Thu, Mar 06, 2025 at 10:49:47PM -0800, Kelsey Zhou wrote:
> > > Hi,
> > >
> > > I hope this message finds you well! My name is Kelsey Zhou, a Computer
> > > Science and Data Science student at UC Berkeley, and I'm reaching out to
> > > express my genuine interest in the Git refactoring project focused on
> > > reducing global state.
> > >
> > > The architectural challenge of modernizing Git's environment handling
> > > immediately caught my attention. Having worked extensively with complex
> > > systems that required careful state management, I'm fascinated by the
> > > opportunity to contribute to such a foundational tool used by developers
> > > worldwide. The prospect of improving Git's maintainability while
> > > potentially enabling better multi-repository handling represents exactly
> > > the kind of meaningful technical challenge I'm eager to tackle.
> >
> > Thank you for your interest!
> >
> > > My background includes relevant experience that I believe would be
> > valuable
> > > for this project:
> > >
> > > At GSK, I worked as a Data Engineer Intern developing pipelines that
> > > processed millions of data points, where I gained hands-on experience
> > with
> > > environment management using Docker and Terraform. This work required
> > > meticulous attention to system architecture and careful handling of state
> > > across different components – skills directly applicable to refactoring
> > > Git's environment handling code.
> > >
> > > While my academic work has primarily focused on data structures,
> > > algorithms, and systems programming, I've developed a strong foundation
> > in
> > > C programming through coursework and personal projects. My experience
> > with
> > > database systems has also given me insight into managing state
> > effectively
> > > across complex software systems.
> > >
> > > I'm particularly curious about:
> > >
> > >    1. Which specific global variables or components have proven most
> > >    challenging to refactor in previous attempts at reducing Git's global
> > state?
> >
> > I think for most of the part there isn't really anything that is
> > _particularly_ challenging. It's more the sheer vastness of global state
> > that the Git project has that makes this an involved project, as every
> > dropped global variable is something that needs careful consideration.
> >
> > There are of course nuances.
> >
> >   - Projects like getting rid of the global `the_repository` variable
> >     are for for most of the part trivial, as it is merely about plumbing
> >     through the variable layer by layer. But a simple grep shows we've
> >     got 3200 references remaining to that variable, so it takes a lot of
> >     time to reduce our reliance on it.
> >
> >   - Other projects, like for example getting rid of global variables in
> >   - "environment.c", require a lot more thought because there is no
> >     ready made solution for each of those variables. Instead, we always
> >     have to think about how that variable is used and then decide on a
> >     specific solution for it.
> >
> > Another challenge in this context is that we must be careful to not
> > break existing behaviour during our refactorings.
> >
> > >    2. Beyond the architectural improvements, are there any performance
> > >    considerations or trade-offs you're anticipating with this refactoring
> > >    effort?
> >
> > Yes and no. We don't expect there to be a significant impact on
> > performance just due to the refactorings. But the architectural
> > improvements may lead to performance improvements down the road:
> >
> >   - We may be able to parallelize more work via multithreading.
> >
> >   - We may be able to perform some tasks without having to spawn a
> >     separate process.
> >
> >   - With proper, compartmentalized subsystems it may also become easier
> >     to refactor their internals more readily, thus unlocking further
> >     performance optimizations.
> >
> > > I would welcome the opportunity to discuss how I might contribute to this
> > > project and learn more about your expectations for GSoC participants.
> > Thank
> > > you for considering my interest.
> >
> > I would strongly recommend to read through [1] and start working on a
> > microproject. This is a prerequisite for every student to get accepted
> > into Git's GSoC program so that we can assess whether we think that the
> > individual is a good fit.
> >
> > Patrick
> >
> > [1]: https://git.github.io/General-Microproject-Information/
> >




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux