On Thu, Mar 24, 2016 at 12:27 AM, Pranit Bauva <pranit.bauva@xxxxxxxxx> wrote: > Hey! > > I have prepared a proposal for Google Summer of Code 2016. I know this > is a bit late, but please try to give your comments and suggestions. > My proposal could greatly improve from this. Some questions: > > 1. Should I include more ways in which it can help windows? I don't think it is necessary. > 2. Should I include the function names I intend to convert? I don't think it is necessary, but if you want, you can take a look at some big ones (or perhaps just one big) and explain how you plan to convert it (using which C functions or apis). > 3. Is my timeline (a bit different) going to affect me in any way? What is important with the timeline is just that it looks realistic. So each task should have a realistic amount of time and the order in which tasks are listed should be logical. I commented below about how I think you could improve your timeline. > Here is a Google doc for my proposal. > https://docs.google.com/document/d/1stnDPA5Hs3u0a8sqoWZicTFpCz1wHP9bkifcKY13Ocw/edit?usp=sharing > > For the people who prefer the text only version : > > ----------------------------------------------------------------------------------- > > Incremental rewrite of Git bisect > > About Me > > Basic Information > > > Name Pranit Bauva > > University IIT Kharagpur > > Major Mining Engineering > > Email pranit.bauva@xxxxxxxxx > > IRC pungi-man > > Blog http://bauva.in > > Timezone IST (UTC +5:30) > > Background > > I am a first year undergraduate in the department of Mining > Engineering at Indian Institute of Technology, Kharagpur. I am an open > source enthusiast. I am a part of Kharagpur Linux Users Group which is > basically a group of open-source enthusiasts. I am quite familiar with > C and I have been using shell for some time now and still find new > things about it everyday. I have used SVN when I was on Windows and > then I switched to Git when I moved to linux. Git seems like magic. I > always wanted to involve in the development process and Google Summer > of Code is an a awesome way to achieve it. > > > Abstract > > Git bisect is a frequently used command which helps the developers in > finding the commit which introduced the bug. Some part of it is > written in shell script. I intend to convert it to low level C code > thus making them builtins. This will increase Git’s portability. > Efficiency of git bisect will definitely increase but it would not > really matter much as most of the time is consumed in compiling or > testing when in bisection mode but it will definitely reduce the > overhead IO which can make the heavy process of compiling relatively > lighter. > > > Problems Shell creates > > System Dependencies > > Using shell code introduces various dependencies even though they > allowing prototyping of the code quickly. Shell script often use some > POSIX utilities like cat, grep, ls, mkdir, etc which are not included > in non-POSIX systems by default. These scripts do not have access to > the git’s internal low level API. So even trivial tasks have to be > performed by spawning new process every time. So when git is ported to > windows, it has to include all the utilities (namely a shell > interpreter, perl bindings and much more). > > Scripts introduce extra overheads > > Shell scripts do not have access to Git’s internal API which has > excellent use of cache thus reducing the unnecessary IO of user > configuration files, repository index and filesystem access. By using > a builtin we could exploit the cache system thus reducing the > overhead. As compiling / testing already involves quite a number of > resources, it would be good if we could do our best to make more > resources available for that. > > Potential Problems > > Rewriting may introduce bugs > > Rewriting the shell script to C might introduce some bugs. This > problem will be properly taken care of in my method of approach > (described below). Still this approach will definitely not guarantee > that the functionality of the new will be exactly similar to the old > one, though it will greatly reduce its possibility. The reviews > provided by the seniors in the git community would help a lot in > reducing bugs since they know the common bugs and how to work around > them. The test suite of git is quite nice which has an awesome > coverage. > > Rewritten can be hard to understand > > Git does not like having many external dependencies, libraries or > executables other than what is provided by git itself and the > rewritten code should follow this. C does not provide with a lot of > other facilities like text processing which shell does whose C > implementation often spans to multiple lines. C is also notorious for > being a bit “cryptic”. This problem can be compensated by having well > written documentation with well defined inputs, outputs and behavior. > > A peek into git bisect > > How does it help? > > Git bisect helps the software developers to find the commit that > introduced a regression. Software developers are interested in knowing > this because a commit changes a small set of code (most time). It is > much easier to understand and fix a problem when you know only need to > check a very small set of changes, than when you don’t know where to > look at it. It is not that the problem will be exactly in that commit > but it will be related to the behavior introduced in the commit. > Software bugs can be a nightmare when the code base is very large. > There would be a lot of sleepless night in figuring out the part which > causes the error. This is where git bisect helps. This is the one of > the most sought after tool in software development. > > Git bisect algorithm (short description) > > Git’s commits form a directed acyclic graph (DAG) which makes it a bit > difficult to find the bisection commit. The algorithm it uses is : > > Step 1: Keeps only the ancestors of “bad” commit (including “bad” one) > and the commits that are not the ancestors of “good” commit (excluding > “good” one). This helps in reducing a lot of “interesting” commits. > > Step 2: Starting from the good ends (or end), associate some kind of > “weight” which depends on its distance from the good commit also > considering some other factors. > > Step 3: Find the midpoint of the max weight which will be the bisection commit. > > Step 4: Repeat Step 1 but now with one more “good” commit which will > cause rearranging of “weights” thus finishing the algorithm which uses > binary search. > > Further details about the algorithm can be found in this article > written by Christian Couder. The link cannot be seen in text mode but it looks ok on the pdf proposal. > Current status of git bisect > > The bisect code is currently spread across multiple files namely > git-bisect.sh, bisect.c, bisect.h, and bisect--helper.c . The > algorithmic part is written in bisect.c . git-bisect.sh acts like a > glue connecting parts of bisect.c with git-rev-list (which helps in > listing the commits in various types and formats), git-rev-parse > (which helps in knowing the SHA-1 checksum for the ref passed), > git-symbolic-refs (to read, modify and delete symbolic refs), > git-show-branch (which helps by showing the branches and their > respective commits), git-update-ref (to safely update the SHA-1 > checksum stored in a ref safely), git-for-each-ref (to get information > on each ref with given pattern, format, keys and much more) and > git-check-ref-format (to check whether the ref has a proper format and > doesn’t contain some “special” characters which aren’t allowed) though > this is soon going to be replaced by using a sub command > `--check-term-format` of bisect--helper whose patch is in its final > stages. The above sentence is not very clear. And if you want to talk about all the git features used by git-bisect.sh, you should perhaps separate the bisect related ones from the other ones, and talk a bit more about the bisect related ones. For example it looks like you are not talking much about bisect--helper.c. > git-bisect.sh is responsible for writing the respective data > concerning the status on bisect to the various files namely > BISECT_LOG, BISECT_NAMES, BISECT_START, BISECT_ANCESTORS_OK, > BISECT_LOG, refs/bisect and those which would be included in the > directory once bisection moves further. > > How will I go about with this project? > > Understand related areas > > In the start, I will focus on understanding the things that > git-bisect.sh depends on. I will try to debug that part of code and > understand it through documentation. I will also learn to use some > more parts of API which I am currently unfamiliar with concerning this > project. I will also gain more familiarity with running gdb sessions > for big projects. > > Discussion about the future look of bisect > > When converting from shell to C, there needs to be quite some amount > of changes rather than just plainly translating the code. This is the > part where I will discuss with the git community as to what parts they > wish they could change it or which are better changed in C. Since C > and shell are quite different, approaches which seem good in shell are > not that good in C and vice versa. So having a thorough discussion can > really help. > > Convert each function individually > > Then I will convert each function from shell script to a C builtin in > bisect--helper.c individually. This raises a very valid questions as > to how to test whether the function performs the tasks as intended > since tests suites are written for git-bisect.sh and not > bisect--helper.c . To take care for this problem, I will first convert > each function to its C builtin and then provide utility in > bisect--helper.c so that git-bisect.sh can easily communicate with it. > In this way, I can make sure that git-bisect.sh uses a corresponding C > builtin rather than origin shell function to run the tests which will > help in recognising for any errors (if any) which the patch can have. > To help git-bisect.sh communicate, I will make the functions as sub > commands in bisect--helper.c so I can ensure that bisect--helper can > react in a way which is independant of other sub commands. This is a bit verbose. Can't you just say that: - you will add an option to "git bisect--helper" to perform what the git-bisect.sh function did, and - you will create a test script for "git bisect--helper" in which you will test each option? > Unification of bisect.c and bisect--helper.c > > This will unify the algorithmic and non-algorithmic parts of bisect > bringing them under one heading to make the code clean. I am not sure this is needed and a good idea. Maybe you will rename "builtin/bisect--helper.c" to "builtin/bisect.c" and remove git-bisect.sh at the same time to complete the shell to C move. But the actual bisect.{c,h} might be useful as they are for other purposes. > Thorough testing of bisect > > This will be dedicated for thorough deep testing for any problems > which crept in after the shell to C conversion. I will also aim to > include some more tests to improve coverage of bisect. If time > persists, I will dig further in examining the failing tests and try > and fix the bugs. Right now there are no failing tests and we expect that all the tests that currently succeed will still succeed after the shell to C conversion. We want step by step changes so that it is possible to check at each step that no tests fail. If you improve test coverage and see some bugs, then you are indeed welcome to fix those bugs if you have time. > Optimizing > > I will mainly try to find ways in which I can make the code run better > or in a nicer way. > > Documenting > > Since C is notorious for being a bit “cryptic”, the documentation can > really help the newcomers to ease into diving the codebase. > > Goals > > Incremental rewrite of `git-bisect.sh` > > Get all the code written merged. Yeah sure, and also it is very important to get code merged early and often. It ensures that you are on the right path at each step. When you have sent one patch series, even a small one, then your main goal should be to have this patch series merged. If you can also work on the next patch series that's nice, but you should focus on getting what you have already sent merged first. > Some important points > > My semester starts from 18th July so I will start working early just > to get reviews of the patches thoroughly and > code/test/optimize/document properly just to avoid end time hustle and > so that I can have some buffer time. Nice. > Also, it would save me from the > extra burden once my academic session starts. If time permits then I > will work on Stephan Beyer’s patches which are related to bisect > algorithm and seem quite interesting to me. > > Timeline (Tentative) > > 1st April - 22nd April > > Spend time going to the codebase and get familiar with using gdb with > it. I will mostly concentrate on reading up bisect related areas as > mentioned above. Also please work on getting the current work you already started merged. > 2nd May - 8th May (1st week) > > Start discussion about the design of bisect and note down the points > which some developers want to be changed. Also see the test suite as > to what is the expected behavior of a particular bisect related > command. I don't think the conversion is about changing some behavior. If it doesn't change any behavior at all, which means not introduce any new bug, then that would be already very good. > 9th May - 26th June (2nd-8th week) > > I count 23 functions (-1 probably) currently in git-bisect.sh . Each > function should take on an average 2 days including code/test/review > and some functions are quite small. Also by this time I will be quite > familiar with the codebase, review process and conventions so I won’t > make many mistakes and the reviews can go faster. Still I have kept > some buffer time for just in case some bug comes (which is quite a > reasonable possibility). Don't count too much on not making many mistakes, we all make mistakes and need many review rounds even after working on Git for many years. Reviews, responding to reviews, splitting commits, rebasing, testing and debugging takes time. > 27th June - 3rd July (9th week) > > Testing - I will mostly test the parts of code that I have written. That should be done before sending the patchs to the list. > Also I will explore more on the use of git bisect and try to improve > its coverage. > > > 4th July - 18th July (10th - 11th week) > > Bug fixing - I will try and fix up some leftover bugs which were > either introduced because of the conversion or which were already > present. Fixing the bugs introduced by the conversion should be done as soon as you know about them. So before sending the patchs to the list, if the bug is found by running the tests. > 18th July - 24th July (12th week) > > Optimizing the C code I have written. Document the code enough so that > it is easier for a newcomer to dive in. Documentation changes should be sent along with the corresponding code patches if possible. > 25th July - 23rd August (left over time) > > I will keep this as a buffer period if anything goes wrong with this > timeline. Also I wish to work on Stephan Beyer’s patches concerning > bisect which currently contain 1 leftover bug. > > > Micro-project attempted > > I attempted a micro project which will add configurations for commonly > used command-line options (git commit -v). The link for the patch is > here. This is in its last stages and is soon expected to be complete. > > Demo function conversion > > As suggested by Johannes Schindelein <Johannes.Schindelein@xxxxxx>, I > have tried converting one function from shell to C, the function being > check_term_format() to have a taste of how to go about with this > project. Here is a link to the patch. This patch is in its final > stage. This exercise was quite helpful! Nice. > Plans for Summer > > I currently don’t have any plans for this summer. Though my new > semester starts from 18th July. So I will start early. If you have exams planned from now until the end of August, it would be nice if you could also talk about them. Thanks, Christian. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html