[RFC][GSoC] Proposal: Refactor git-bisect(1)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all, this is my proposal for the GSoC 2024.

----------------------------------------------------------------------------------------------------

Refactor git-bisect(1) to make its state self-contained

Contacts
-------------------

Full Name: Eugenio Gigante
Email: giganteeugenio2@xxxxxxxxx
Github: www.github.com/EuGig


Synopsis
------------

The git-bisect(1) tool is employed to pinpoint the exact commit within a
series of commits that introduced a particular bug. When initiating a
bisection session,
it generates a collection of state files within the Git repository,
documenting diverse
parameters such as ".git/BISECT_START". Instead of having different
files scattered
in the ‘.git’ directory, it has been suggested by Patrick to introduce
a new ‘.git/bisect-state’
directory which will contain the state files
(https://lore.kernel.org/git/Za-gF_Hp_lXViGWw@tanuki/).
The aim of the project is to implement this new layout for storing
git-bisect(1) related files.
The project will also handle backward compatibility issues that may
arise with such a change.
This is one of the project ideas suggested by the community of Git.
The difficulty for the project should be medium and it should take
somewhat between 175 to 350 hours.


Contributions
-------------------

add.c: use unsigned integral type for collection of bits
(https://lore.kernel.org/git/20240224112638.72257-2-giganteeugenio2@xxxxxxxxx/)


Description:

Since the MSB of signed integral type’ is special, ‘flags’ fields should be
declared of type ‘unsigned integral’. builtin/add.c:refresh() declares
'flags' as signed,
and uses it  to call refresh_index() that expects an unsigned value.
Fix this by modifying its type ‘unsigned int’.

Although the code was easy, this patch gave me the opportunity to get
familiar with
the process of submitting patches. The lesson learned was not only
about understanding
git commands for formatting and sending patches to the mailing list;
but, most importantly,
was about how to interact with the community; i. e., how to interact
with reviewers,
to ask them questions and give them answers. That is why I chose a
micro-project idea
that explicitly required interaction with the community, before
writing any code.

Status:

Merged to master on: Thu Mar 7 15:59:41 2024
commit f46a3f143eba661b9eb2b1e741447d6709eb6e90


Deliverables
------------------

As soon as the acceptance to the program is communicated, I will start writing a
sort of Backlog of required changes which has to be shared with mentors.
This has the goal of better tracking progress and possible issues.
Already individuated changes and possible strategies are considered in
what follows:

1. First, one needs to inspect the possible impacts such a change could have
regarding Backward compatibility. There will be two possible situations in which
backward compatibility breaks:

a. (Noted by Junio (https://lore.kernel.org/git/xmqqwms0ndvu.fsf@gitster.g/)).
People switch versions in the middle of a git-bisect session. Even
though I think this
case would not be that common, one should take this into consideration
nonetheless.

b. (Noted by Patrick (https://lore.kernel.org/git/ZbDPqOnyLw4yhu--@tanuki/)).
Different implementations of Git could suffer backward
incompatibilities like Libgit2,
JGit and Dulwich. In this case, I will investigate if (and how) these
tools would suffer the change.

In this phase it is important to find an agreement with the community
if the refactoring
would be worth it, and in case it is, to find the best solution to
guarantee backward compatibility.
The simplest way seems to be to teach Git how to recognise the old
style layout and move
the files by following the new one. This can be achieved by checking
the presence of state files
inside ‘.git’, just after the starting of a bisect session and by
moving them accordingly.


2. After a strategy for backward compatibility is decided, I will
refactor the directory
where all the file states are created. This is done by changing the
function factory
‘GIT_PATH_FUNC’ in ‘builtin/bisect.c‘’. The following is an example
for ‘BISECT_START’:

- #static GIT_PATH_FUNC(git_path_bisect_start, “BISECT_START”)
+ #static GIT_PATH_FUNC(git_path_bisect_start, “bisect-state/start”)

Similarly for ‘bisect-state/terms’, etc. This kind of changes should
also be done in ‘bisect.c’
(which contains the binary searching algorithm git-bisect(1) uses).
Then, as noted by
Junio (https://lore.kernel.org/git/xmqqwms0ndvu.fsf@gitster.g/) this
path should be
marked per-worktree, since each worktree can have its own bisect sessions.
Probably, also files related to git-worktree use the directory of
state files somehow,
so one should also check them.

3. Write some tests for the new layout. The plan is to use the new
unit testing framework. First tests that come to mind are:

a. Check that the state files are inside ‘.git/bisect-state’.
b. Check if the path pertains to GIT_DIR and not COMMON_DIR.
c. Check that after adding a worktree which was being bisected does
not somehow spoil the session and the state files.

This can be taken as an example of test file for git-bisect(1) using
the new framework:

#include test-lib.h
#include bisect.h

// Write functions like:
// t_bisect_dir_init()
// t_worktree_init()
// t_test_layout()
// and include them in cmd_main

int cmd_main(int argc, const char **argv)
{
if (!TEST(t_bisect_dir_init(), “bisect directory initialized”))
test_skip_all(“Bisect initialization failed”);
if (!TEST(t_worktree_init(), “worktree initialized”))
test_skip_all(“Worktree initialization failed”);
TEST(t_layout(), “new laypout”);

return test_done();
}

4. Implement strategy for backward compatibility and related tests such as:

a. Check if Git correctly recognizes the old layout and in case
correctly moves the files according to the new one.


Timeline
------------

May 1 - 26:
Community bounding Period.
Read Documentation.
Write Backlog.


May 27 - July 8:
Implement a new layout for state files.
Write tests.

July 12 - August 19:
Assess and implement backward compatibility.

August 19 - Deadline :
Write documentation for the project.

I can dedicate 3 hours a day during weekdays, and 5 hours during the weekends.


Benefits to the Community
-------------------------------------

How state files dedicated for git-bisect(1) are stored is not ideal.
Having a single directory dedicated to these files, instead of having different
files spread in the Git directory during the bisecting session, is
better and facilitates
even the removal of them after bisecting.  Moreover, as noted by
Phillip (https://lore.kernel.org/git/9c800cd5-8d20-4df7-8834-f74ab000695e@xxxxxxxxx/#t),
by aligning the organization of these files to that used for am,
rebase, cherry-pick, etc.,
the repo will have a more coherent and uniform layout.
This will enhance readability and maintainability in general.

Karthik [1] also expressed  the need of such refactoring as a prerequisite
to introduce new syntax checks for pseudoref in ‘is_pseudoref_syntax()’.


[1] See Karthik’s commit: 1eba2240f8ba9f05a47d488bb62041c42c5d4b9c


Biographical information
----------------------------------

I’m a former student of Logic and Philosophy who turned to coding
after graduating.
I have been working as a Developer for NTT Data Italia for a year.
I hold a full-time job, but I've seen that it doesn't conflict with
the rules of GSoC
since I'm an open source beginner. I am fully capable of managing my
workload independently,
including my working hours. I know it is not ideal, but I can
definitely work around
my schedule and dedicate time to the project.

Before, I have contributed small patches to two open source projects:
- Pydata/Sparse: https://github.com/pydata/sparse/pull/611
- Pennylane: https://github.com/PennyLaneAI/pennylane/pull/3959

-------------------------------------------------------------------------------------------------

Thanks,
Eugenio.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux