GSoC Git Proposal Draft - ZheNing Hu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, Git,
I'm ZheNing Hu,
Here is my GSoC 2021 Proposal draft.
And website version is there :
https://docs.google.com/document/d/119k-Xa4CKOt5rC1gg1cqPr6H3MvdgTUizndJGAo1Erk/edit

Welcome any Comments and Correct :)

----8<----
## Use ref-filter formats in git cat-file

### About Me
| Name | ZheNing Hu |
| ---------- | ------------------------------------------ |
| Major | Computer Science And Technology |
| Mobile no. | +86 15058356458 |
| Email | adlternative@xxxxxxxxx |
| IRC | adlternative (on #git-devel/#git@freenode) |
| Github | https://github.com/adlternative/ |
| Blogs | https://adlternative.github.io/ |
| Time Zone | CST (UTC +08:00) |

### Education & Background
* I am currently a 2nd Year Student majoring in computer science and
technology in Xi'an University of Posts & Telecommunications (China).
* In my freshman year, I joined the XiYou Linux Group of the
university and learned how to use Git to submit my own code to GitHub.
I have learned C, C++, Python and shell in two years, I know how to
use gdb debugging, and I am familiar with relevant knowledge of Linux
System Programming and Linux Network Programming.
* I started learning Git source code and made contributions to Git
from December of 2020.

### Me & Git
Around last November, I found a couple of projects
[build-your-own-git](https://github.com/danistefanovic/build-your-own-x#build-your-own-git)
on GitHub teaching me how to write a simple git, the mechanics of Git
are very interesting:

1. There are four types of objects in Git: BLOB, TREE, COMMIT, TAG
2. The (loose)objects are stored in `.git/object/sha1[0-1]/sha1[2-39]`
with the sha1 value of the data as the storage address.
3. All branches are just references to commits.

Then I read`《Pro Git》`and Jiang Xin's `《Git Authoritative Guide》`,
learned the use of most Git subcommands.

Later, I started learning some of the Git source code, I found Git has
at least 200,000 lines of C code and 200,000 lines of shell script
code, which leaves me a little confused about where to start.

But then, after I submitted my first patch, a lot of people in the Git
community came over and gave me very enthusiastic guidance, which gave
me the courage to learn the Git source code, and then I started making
my own contributions, You can find them here:
[gitgitgadget](https://github.com/gitgitgadget/git/pulls?q=is%3Apr+author%3Aadlternative+)
or
[git.kernel.org](https://git.kernel.org/pub/scm/git/git.git/log/?qt=grep&q=ZheNing+Hu)


These patches have been merged into the "master" branch:

#### [master]
* difftool.c: learn a new way start at specified file [(mail
list)](https://lore.kernel.org/git/pull.870.v6.git.1613739235241.gitgitgadget@xxxxxxxxx/)
* ls-files.c: add --deduplicate option
[(mail list)](https://lore.kernel.org/git/384f77a4c188456854bd86335e9bdc8018097a5f.1611485667.git.gitgitgadget@xxxxxxxxx/)
* ls_files.c: consolidate two for loops into one
[(mail list)](https://lore.kernel.org/git/f9d5e44d2c08b9e3d05a73b0a6e520ef7bb889c9.1611485667.git.gitgitgadget@xxxxxxxxx/)
* ls_files.c: bugfix for --deleted and --modified
[(mail list)](https://lore.kernel.org/git/8b02367a359e62d7721b9078ac8393a467d83724.1611485667.git.gitgitgadget@xxxxxxxxx/)
* builtin/*: update usage format
[(mail list)](https://lore.kernel.org/git/d3eb6dcff1468645560c16e1d8753002cbd7f143.1609944243.git.gitgitgadget@xxxxxxxxx/)

And These patches are in the queue:

#### [next]

* format-patch: allow a non-integral version numbers
[(mail list)](https://lore.kernel.org/git/pull.885.v10.git.1616497946427.gitgitgadget@xxxxxxxxx/)
* [GSOC] commit: add --trailer option
[(mail list)](https://lore.kernel.org/git/pull.901.v14.git.1616507757999.gitgitgadget@xxxxxxxxx/)

#### [WIP]

* gitk: add right-click context menu for tags
[(mail list)](https://lore.kernel.org/git/pull.866.v5.git.1614227923637.gitgitgadget@xxxxxxxxx/)
* [GSOC] trailer: pass arg as positional parameter
[(mail list)](https://lore.kernel.org/git/5894d8c4b36466326b0427bfda0d6981e52a0907.1617185147.git.gitgitgadget@xxxxxxxxx/)

### Proposed Project

* Git used to have an old problem of duplicated implementations of
some logic. For example, Git had at least 4 different implementations
to format command output for different commands.

* `git cat-file` is a git subcommand used to see information about a git object.

* `git cat-file --batch` can print object information and contents on
stdin. The only difference between `--batch-check` and `--batch` is
that `--batch-check` does not print the contents of the object.
* `--batch-all-objects` will show all objects with `--batch` or `--batch-check`.
* `--batch-check` and `--batch` both accept formatted strings:
* `%(objectname)`: 40-bit SHA1 string of Git object
* `%(objecttype)`: Object Type blob,tree,commit,tag
* `%(objectsize)`: Size of the object's content
* `%(objectsize:disk)`: The size of the object itself on disk
* `%(delatbase)`: If the object is stored incrementally in Git,
Returns the SHA1 string for its delabase
* `%(rest)`: Anything before the space and TAB in the input
line is treated as an object, and anything after
that will be printed as usual
* In the original design, the first time use `expand_format()` in
`batch_objects()` is to parsing formatted messages, the second time
use `expand_format()` in `batch_object_write()` is to format the
object information and store it in a string buffer, eventually the
contents of this buffer will be printed to standard output.


* [Olga](olyatelezhnaya@xxxxxxxxx) have been involved in integrating
`ref-filter` logic into `cat-file`
[(link)](https://github.com/git/git/pull/568), the problem with her
patches at that time:
1. Too long patch series, difficult to adjust and merge.
2. I don't think it's a good idea for her to use `struct
ref_array_item` instead of `struct expand_data` for `cat-file` to fit
`ref-filter` logic, because `struct ref_array_item` and `struct
expand_data` are not very related.
[(link)](https://github.com/git/git/pull/568/commits/e0aafaa76476ba5528f84b794043531ebd4633c7#diff-d03110606a7ed8cb9832bbcc572f1093435cc6115c4e58d7a7750af3c33319a7R238)

* Because part of the feature of `git for-each-ref` is very similar to
that of `git cat-file`, I think `git cat-file` can learn some feasible
solutions from it.

#### My possible solutions:

1. Same [solution](https://github.com/git/git/pull/568/commits/cc40c464e813fc7a6bd93a01661646114d694d76)
as Olga, add member `struct ref_format format` in `struct
batch_options`.
2. Use the function
[`verify_ref_format()`](https://github.com/gitgitgadget/git/blob/84d06cdc06389ae7c462434cb7b1db0980f63860/ref-filter.c#L904)
to replace the first `expand_format()` for parsing format strings.
3. Write a function like
[`format_ref_array_item()`](https://github.com/gitgitgadget/git/blob/84d06cdc06389ae7c462434cb7b1db0980f63860/ref-filter.c#L2392),
get information about objects, and use `get_object()` to grub the
information which we prefer (or just use `grab_common_value()`).
4. The migration of `%(rest)` may require learning the handling of
`%(if)` ,`%(else)`.

### Are you applying for other Projects?

No, Git is the only one.

### Blogging about Git

In fact, while I am studying Git source code, I often write some
[blogs](https://adlternative.github.io/tags/git/) to record my
learning content, this helps me to recall some content after
forgetting it. Most of the blogs were written in Chinese previously,
but during the GSoC, I promise all my blogs will be written in
English.

### TimeLine
* May 18 ~ June 8
* Look for a scheme to make `git cat-file` and `ref-filter` more
compatible, and start the integration attempt.
* *Stretch Goal*: move `%(objectsize)`,`%(objecttype)`,`%(objectname)` .

* June 8 ~ July 8
* Move the body of the `git cat-file` attempt to the `ref-filter`
logic, complete the basic function realization.
* *Stretch Goal*: move `%(deltabase)`,`%(objectsize:disk)`,`%(rest)` .

* July 8 ~ August 17
* Analyze the performance of ref-filter and try to reduce the
performance cost of a lot of string matching. I thought if I had some
spare time, I could work on some other interesting patches.
* *Stretch Goal*: Optimize ref-filter performance.

### Availability
My exam is expected to end in June, but the time I don't have classes
before the final exam, as well as the summer vacation after that, is
basically my self-learning time. Although I am studying many other
courses, I have enough time and energy to complete daily tasks. I'm
staying active on the Git mailing list, you can find me at any time as
long as I am not sleeping. :)


### Post GSoC
* I love open source philosophy, willing to spread the spirit of
openness, freedom and willing to research technology with like-minded
people.
* In my previous contact with the Git community in the past few
months, many people in the Git community gave me great encouragement.
I hope I can keep my passion for Git alive, contribute my own code,
and pass this cool thing on.
* I am willing to contribute code to the Git community for a long time
after the end of GSoC.
* I hope the Git community can give me a chance to participate in
GSoC. I sincerely thank GSoC and the Git community!




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux