Hi, On Wed, Mar 25, 2015 at 2:37 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Paul Tan <pyokagan@xxxxxxxxx> writes: > >> ..., I propose the following requirements for the rewritten code: >> >> 1. No spawning of external git processes. This is to support systems with high >> ``fork()`` or process creation overhead, and to reduce redundant IO by >> taking advantage of the internal object, index and configuration cache. > > I suspect this may probably be too strict in practice. > > True, we should never say "run_command_capture()" just to to read > from "git rev-parse"---we should just call get_sha1() instead. > > But for a complex command whose execution itself far outweighs the > cost of forking, I do not think it is fair to say your project > failed if you chose to run_command() it. For example, it may be > perfectly OK to invoke "git merge" via run_command(). Yes, which is why I proposed writing a baseline using only the run-command APIs first. Any other optimizations can then be done selectively after that. I think it's still good to have the ideal in mind though (and whoops I forgot to put in the word "ideal" in the text). > >> 3. The resulting builtin should not have wildly different behavior or bugs >> compared to the shell script. > > This on the other hand is way too loose. > > The original and the port must behave identically, unless the > difference is fixing bugs in the original. > I was considering that there may be slight behavioral changes when the rewritten code is modified to take greater advantage of the internal API, especially since some builtins due to historical issues, may have duplicated code from the internal API[1]. [1] I'm suspecting that the implementation of --merge-base in show-branch.c re-implements get_octopus_merge_bases(). >> Potential difficulties >> ======================= >> >> Rewriting code may introduce bugs >> ... > > Yes, but that is a reasonable risk you need to manage to gain the > benefit from this project. > >> Of course, the downside of following this too strictly is that if there were >> any logical bugs in the original code, or if the original code is unclear, the >> rewritten code would inherit these problems too. > > I'd repeat my comment on the 3. above. Identifying and fixing bugs > is great, but otherwise don't worry about this too much. > > Being bug-to-bug compatible with the original is way better than > introducing new bugs of an unknown nature. Well yes, but I was thinking that if there are any glaring errors in the original source then it would be better to fix these errors during the rewrite than wasting time writing code that replicates these errors. >> Rewritten code may become harder to understand >> ... > > And also it may become harder to modify. > > That is the largest problem with any rewrite, and we should spend > the most effort to avoid it. > > A new bugs introduced we can later fix as long as the result is > understandable and maintainable. > >> For the purpose of reducing git's dependencies, the rewritten C code should not >> depend on other libraries or executables other than what is already available >> to git builtins. > > Perhaps misphrased; see below. In this case I was thinking of "making git depend on another project". (e.g, using an external regular expression library). Of course a balance has to be made in this aspect (thus the use of "should not"), but git-pull and git-am are relatively simple so there should be no need for that, > >> We can see that the C version requires much more lines compared to the shell >> pipeline,... > > That is something you would solve by introducing reusable code in > run_command API, isn't it? That is how various rewrites in the past > did, and this project should do so too. You should aim to do this > project by not just using "what is already available", but adding > what you discover is a useful reusable pattern into a set of new > functions in the "already available" API set. Whoops, forgot to mention that here. (A brief mention was made on this kind of refactoring in the Development Approach). Thank you for your review. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html