Well, Google requires me to have a draft on a Google Doc anyway for the proposal, and I am unsure who exactly it will reach. Since it *is* part of the discussion regarding my proposal, I suppose it is worth posting here for anyone to comment: https://docs.google.com/document/d/1dvF2PNRQvvZ351jCdKzOLs7tzaDqhR7ci7TDgzYQg9I/edit?usp=sharing. -- Daniel. On Fri, Mar 31, 2017 at 2:07 AM, Daniel Ferreira (theiostream) <bnmvco@xxxxxxxxx> wrote: > Hi Stefan & Johannes, > > Thank you for the precious feedback on the proposal. I don't see much > sense in sending a full "v2" of it and have you read it all over > again, so I'll just answer to your comments directly. > > Also, although the GSoC website allows me to send a "proposal draft" > to you through the website, since I've already sent it here that > shouldn't be necessary, correct? I intend to use it just to send the > final thing. > > On Wed, Mar 29, 2017 at 9:01 PM, Johannes Schindelin > <Johannes.Schindelin@xxxxxx> wrote: >> On Tue, 28 Mar 2017, Stefan Beller wrote: >> >>> On Sat, Mar 25, 2017 at 8:15 PM, Daniel Ferreira (theiostream) >>> <bnmvco@xxxxxxxxx> wrote: >>> >>> > SYNOPSIS >>> > There are many advantages to converting parts of git that are still >>> > scripts to C builtins, among which execution speed, improved >>> > compatibility and code deduplication. >>> >>> agreed. >> >> I would even add portability. But yeah, speed is a big thing. I am an >> extensive user of `git add -p` (which is backed by >> git-add--interactive.perl) and it is slow as molasses on Windows, just >> because it is a Perl script (and the Perl interpreter needs to emulate >> POSIX functionality that is frequently not even needed, such as: copying >> all memory and reopening all file descriptors in a fork() call only to >> exec() git.exe right away, tossing all of the diligently work into the >> dustbin). > > Thanks for this example – it hadn't come to my mind since I don't use > Git on Windows. I'll be sure to complement the synopsis with it. :) > >> >>> > FEASIBILITY >>> > >>> > There was only one discussion regarding the feasibility of its porting >>> > (https://public-inbox.org/git/CAP8UFD2PcBsU6=FK4OHVrB7E98ycohS_0pYcbCBar=of1HLx+Q@xxxxxxxxxxxxxx/). >>> > It resulted in a consensus that doing it would be a task too large – >>> > although interesting – for GSoC 2015 based on the amount of its lines >>> > of code. It is, however, only a few lines larger than >>> > git-rebase--interactive, which has been considered an appropriate >>> > idea. As such, it looks like a possible project for three months of >>> > full-time work. >>> >>> ok, it sounds a challenging project. (currently counting 1750 lines of >>> code). Scrolling over the source code, there are quite a couple of >>> functions, where the direct equivalent in C springs to mind. >>> >>> run_cmd_pipe -> see run-command.h >>> unquote_path -> unquote_c_style ? >>> refresh -> update_index_if_able() >>> list_modified -> iterate over "const struct cache_entry *ce = active_cache[i];" > > Thank you for these functions. I don't think I will be able to specify > them in detail as part of the projected timeline (e.g. "June 1: > convert calls to refresh() to use update_index_if_able()") already > because there is not enough time prior to the proposal deadline to > study their behavior in detail, and I like to avoid talking about > things I don't fully understand. Although I think I can cite them as > examples for a thesis I had put elsewhere in the proposal that "Git > APIs in Perl already have functional equivalents in C". > > Also, they will be great for my early investigation stage into > git-add--interactive. :) Once more, thanks for having listed them. > >> Yes, I think it would be more important to acquaint oneself with the >> idiosynchracies of Git's internal "API" than to get familiar with Perl: >> interpreting what obscure Perl code does is something I would gladly do as >> a mentor. > > That's really nice! I usually don't get stuck when trying to > understand code in languages I'm not too well acquainted with, but I > figured getting more familiar with Perl would speed development up. > But it does make sense that this "prior to May 4" might be better > invested learning about git's internals than Perl. > > Question: do you suggest any pending bugfix to git-add--interactive or > to something related that might give some useful knowledge in advance? > (for the pre-code period). My microproject involves playing with the > dir_iterator interface, which is a great exercise in code refactoring > but really does not teach me too much about Git's architecture. > > Even if you do not have an answer to this, I'm pretty sure I'll keep > this commitment to submitting some patch series somehow related to > git-add before GSoC begins, especially after this comment from > Johannes. > >> >>> > PROJECTED TIMELINE >>> > - Prior to May 4 >>> > -- Refine my basic knowledge of Perl >>> > -- Craft one or two small patches to some of Git's Perl components >>> > (preferentially to git-add--interactive itself) to improve my >>> > understanding of the language and of how Git's Perl scripts actually >>> > work > > So yeah, I think this could be rewritten as: > > - Prior to May 4 > -- Craft two or three small patch series to git-add--interactive or > related components to improve my understanding of Git's internal > architecture, especially that related to git-add. > >> >>> > - May 4 - May 30 >>> > -- Clarify implementation details with my mentor, and work on a more >>> > detailed roadmap for the project >>> > -- Investigate roughly how to replace command invocations from the >>> > script with actual builtin functions; which Git APIs in Perl already >>> > have functional equivalents in C; which parts will require a full >>> > rewrite. >>> There are different approaches for replacing functionality in another >>> language. Examples: >>> * Implement the functionality in C and then have a "flag-day" commit >>> 783d7e865e (builtin-am: remove redirection to git-am.sh, 2015-08-04) >>> This only works when the whole functionality was replaced in prior commits >>> * Implement partial functionality in C and call it via a helper function. >>> 3604242f08 (submodule: port init from shell to C, 2016-04-15) >>> This works well for only partial conversions (the larger the thing to >>> convert the more appealing this is, as it gets code shipped early.) >>> When choosing this strategy, this part of the Project would be to >>> identify parts that could be ported on its own without much >>> additional glue-code. >> >> To offer my perspective: I strongly prefer the latter approach. Not only >> does it yield earlier results, it also makes it substantially easier to >> handle the project even if it should turn out to be a little larger than >> just 3 months. > > I agree. I even think that the latter parts of this projected timeline > imply a choice for the second approach. > > For now, Stefan's implementation in 3604242f08 (create a helper > builtin command, call it from a script, do it to other functions until > the whole script is replaced) seems the most natural way to do it. But > sadly, I still do not have enough knowledge about the project to be > able to specify in this proposal how (e.g. in what order) exactly I > intend to "divide" the code to do this. But I'm sure the answer to > this question will be clearer through contact with my mentor and with > growing experience in the project. > >> >>> > - May 30 - June 30 (start of coding period) >>> > -- Define the architecture of the builtin within git (which >>> > functions/interfaces will it have? where will its code reside?). >>> > -- Implement a small subset of the builtin (to be defined with my >>> > mentor) and glue it into the existing Perl script. Present this as a >>> > first patch to get feedback early regarding the implementation and >>> > avoid piling up mistakes early. >>> > -- Do necessary changes based on this initial review. >>> > -- Have roughly 1/3 of the script's functionality ported to C. >>> > >>> > - June 30 - July 28 >>> > -- Port the remainder of the script to a builtin. >>> > -- Have a weekly roadmap, sending a part of the patch every 15 days to >>> > the mailing list for review and to avoid massive commits by the end of >>> > GSoC. >>> >>> yeah; send early, send often. ;) >> >> Even better: push multiple times a day to a public repository, say, on >> GitHub. That allows for "Work In Progress" commits that not only serve as >> a backup but also as transparent report what was done. > > Great idea. It can also be a good way to report my progress to my > mentor. (I'll do my best to avoid "[WIP]" commits with barely any > explanation). > > As for the "timeline", I think it can be improved by some better > specification regarding the gradual process of the implementation > opposed to switching from the script to a builtin all at once. > >>> > -- Apply suggestions from community reviews when possible; if not, >>> > save them for doing toward the end of GSoC (see below). >>> >>> Please do not underestimate the discussion by community, finding >>> consensus on list consumes a bit of time in some cases. >> I agree with this statement. Ideally, the first patch series would be >> ready to submit very soon into the project, something like 2 weeks (which >> is another point in favor of the helper approach outlined by Stefan). >> The smaller the patch series are, the more likely they will get in >> quickly. And the easier is it to address reviewers' comments, because you >> won't have to send out gazillion unchanged patches of a multi-dozen patch >> series just to address one typo in one patch. > > Seems reasonable. I'll amend the proposal to commit to a first patch > series in 2 weeks and to sending a new one every week or less (to > avoid series getting massive). I'll also refrain from postponing > responding to feedback and, when necessary, work in parallel between > amending patches and working on new code. After all, I know how my > schedule will be during the whole GSoC period, but I don't know how > available those in the list will be to review it, so it's probably > best to respond early. > >>> > (Note: due to a previous commitment, during a five-day period of July >>> > I will only be able to work part-time on GSoC. The actual week will be >>> > known over the next weeks.) >>> >>> Maybe you want to shift the schedule up to here by one week then? >>> (e.g. the first period would be April 27 - May 23) >>> > > I cannot shift the schedule one week earlier during the pre-code > period because Google only releases mentor information on May 4, and > the things I'm planning for the second period depend on the mentor. > However, my classes only start in September 22, so I would have > absolutely no problem continuing to work full-time on Git after the > GSoC period is officially over to compensate for these five days. > >>> > >>> > - July 28 - August 29 >>> > -- By the start of this period, send a patch with the builtin fully >>> > implemented to the mailing list. >>> >>> /a patch/a patch series consisting of many patches/ >>> Experience shows that smaller patches are easier to review as >>> it is more focused. > > I will reword it. > >>> Consider e.g. e86ab2c1cd (wt-status: convert to struct object_id, >>> 2017-02-21) and the parents leading up to this commit. They work on the >>> same big topic, but focus on very regional areas to ease review. >>> >>> > -- Fix bugs, test extensively, possibly extend test coverage for >>> > git-add--interactive. >>> >>> AFAICT ('$ git grep "git add -i"') there is only t3701 testing the >>> interactive add. Maybe we need to add tests first to document >>> current behavior, before attempting a conversion? >> >> That should indeed be the first step, before doing much else. >> >> However, it does look as if t3701's 40 test cases provide fairly thorough >> coverage. >> >>> This could go well into the period "May 4 - May 30", as writing >>> tests would count as "Clarify implementation details". >> >> I agree with that, too. > > Seems reasonable. As I said above, working on patches (if possible) > related to git-add--interactive will probably lead me to discovering > coverage issues, which could then be dealt with in this pre-code > period already with some guidance from my mentor. > >> Thanks for working on this! > > Thank *you*, and Stefan, for the careful review of it. It means a lot. > > -- Daniel.