Re: [GSoC][RFC][PROPOSAL] Reachability bitmap improvements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Abhradeep,

Thanks for drafting your initial GSoC proposal. Some thoughts inline:

On 07-04-2022 01:39, Abhradeep Chakraborty wrote:

I am Abhradeep Chakraborty, currently pursuing B.Tech in Information
Technology (now in my 3rd year) at Jalpaiguri Government Engineering
College, India.

In the last two years, I mainly concentrated on learning and building
web development things. I mainly use Django, React for my projects. But
for the past 6 months, I have been learning about devops, cloud technologies
( e.g. docker, Kubernetes, AWS etc.), computer networking and core
technologies (such as git, linux, gcc etc.). I also have an interest in
some research based technologies like WebRTC, Web3 etc.


That's good to know. Do let us know if you have any experience with C
and shell programming outside of your contributions to Git.

My ultimate objective is to be a dedicated and active contributor to all
of these technologies. So, I want to start this contribution journey with
`git` and therefore I want to be a part of this GSoC program.

## Me & Git

I have started exploring the Git codebase since Sept 2021. At first, I
mainly focused on knowing the git internals. During this time I read
documentations (Under `Documentation` directory). `MyFirstContribution.txt`[1],
`MyFirstObjectWalk.txt`[2] and `Hacking Git`[3] helped me to learn about the git
contribution workflow, how `git log` ( in other words `object walk`) works
internally.

Though I had read many documentations, I was still not able to fully
understand the terminologies (like `refs`, `packfiles`, `blobs`, `trees`
etc.). (ProGit)[https://git-scm.com/book/en/v2] helped me tremendously here.
I read the full book and it cleared almost every doubt that came into my mind.

I have provided my git involvement history below. Activities are sorted
by timeline from top to bottom.

==============================
PRs, Commits & Discussion Participations
==============================
>
> [ ... snip ... ]

Good to see your various contributions to Git so far. Good work!

## Proposed Project

### Abstract

While sending large repositories, git has to decide what needs to be
sent, how to be sent. For large repositories, git creates pack files
where it packs all the related commits, trees, blobs( in the form of
deltas and bases ), tags etc. Now, finding the related objects among
all the objects might be very expensive.

As git only knows about the branch tips, it is very expensive to find
the related objects from all the objects if we try to traverse down
from the branch tips to all their respective related objects. It becomes
more expensive as the number of objects (i.e. commits, trees, blobs
etc.) increases. Ultimately resulting in slow fetching from the git
daemon.

To counter that, reachability bitmaps were introduced. It uses bitmap
data structure (an array of bits) to detect if an object is reachable
from a particular commit or not. The commit messages of this
(patch series)[https://lore.kernel.org/git/1372116193-32762-1-git-send-email-tanoku@xxxxxxxxx/]
are itself a documentation of how it is achieved. All the bitmap
indexes for selected commits (in a EWAH compressed form) are stored in
a `.bitmap` file which has the same name of its respective packfile.


s/indexes/indices/

[ ... snip ... ]
>> ## Estimated Timeline

> [ ... snip ... ]

Taylor has shared feedback on the proposal and timeline which I hope are helpful to you. Specifically, Taylor said:


So I think it makes sense to try and find a way you can dip your toes
into 2-3 of the sub-projects and get a sense for what feels most doable
and interesting, then focus on that before allocating time for more
projects in the future.
>

I agree with this. It's better to be practical than ambituous when planning the proposal.


## Availability

I intentionally freed my summer slot for GSoC. So, there wouldn’t
be much disturbance/inactive days while working on my project.
Generally, I can spend nearly 35-40 hours a week on this project.
If the college gets closed for some reason, then the amount of
available time will increase. Smart India Hackathon finale would
be held in July. So, I may be inactive for a few days. But overall,
I will be available most of the time.


Thanks for letting us know of possible things that would interfere with GSoC! Do make sure to keep us posted in case you come to know of anything else.

## Post GSoC

Once the GSoC finishes, I would look into other issues/tickets and
would send PRs solving those issues. As I said before, I want to
be a long term contributor to git - I would definitely maintain the
flow and continue to work on improving the overall git tool.


That's nice.

--
Sivaraam



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux