Hi Johannes, The patch works very well on Windows. Tested it on a repository with 146823 commits and for every commit a note. Groeten, Sebastiaan Dammann On Tue, 12 Nov 2019 at 20:18, Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote: > > Hi, > > [Cc:ing Sebastian, as they indicated in > https://public-inbox.org/git/CAE7Eq9hEiVf1rMNdWx55_nQsz2gVv0N%2Bs1KckK1evtmruqcHyA@xxxxxxxxxxxxxx/t/#u > that they would be interested in testing this] > > Sebastian, could you test this patch series? Since you are on Windows, > you should be able to do so by > > - installing Git for Windows' SDK > (https://gitforwindows.org/#download-sdk) > - `sdk cd git` (possibly `sdk init git`, although that should be > implied) > - `sdk init build-extra` followed by > `/usr/src/build-extra/apply-from-public-inbox.sh > https://public-inbox.org/git/20191112103821.30265-1-szeder.dev@xxxxxxxxx/` > - `sdk build` > > The result should include a `git.exe` in `/usr/src/git/` that you can > copy to your server and test via `/path/to/git.exe name-rev ...`. > > Ciao, > Johannes > > On Tue, 12 Nov 2019, SZEDER Gábor wrote: > > > 'git name-rev' is implemented using a recursive algorithm, and, > > consequently, it can segfault in deep histories (e.g. WebKit), and > > thanks to a test case demonstrating this limitation every test run > > results in a dmesg entry logging the segfaulting git process. > > > > This patch series eliminates the recursion. > > > > Patches 1-5 are while-at-it cleanups I noticed on the way, and patch 6 > > improves test coverage. Patches 7-11 are preparatory refactorings > > that are supposed to make this series easier to follow, and make patch > > 12, the one finally eliminating the recursion, somewhat shorter, and > > even much shorter when viewed with '--ignore-all-space'. Patch 13 > > cleans up after those preparatory steps. > > > > Changes since v1: > > > > - Patch 12 now eliminates the recursion using a LIFO 'prio_queue' > > instead of a 'commit_list' to avoid any performance penalty. > > > > - Commit message updates, clarifications, typofixes, missing > > signoffs, etc., most notably in patches 6 and 12. > > > > - Updated ASCII art history graphs. > > > > - Replaced the strbuf_suffix() cleanup in patch 3 with René's > > suggestion; now that patch needs his signoff. > > > > - Dropped the last two patches plugging memory leaks; René's plan > > to clean up memory ownership looked more promising, and that > > would make these two dropped patches moot anyway. > > > > v1: https://public-inbox.org/git/20190919214712.7348-1-szeder.dev@xxxxxxxxx/T/#u > > > > René Scharfe (1): > > name-rev: use strbuf_strip_suffix() in get_rev_name() > > > > SZEDER Gábor (12): > > t6120-describe: correct test repo history graph in comment > > t6120-describe: modernize the 'check_describe' helper > > name-rev: avoid unnecessary cast in name_ref() > > name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation > > t6120: add a test to cover inner conditions in 'git name-rev's > > name_rev() > > name-rev: extract creating/updating a 'struct name_rev' into a helper > > name-rev: pull out deref handling from the recursion > > name-rev: restructure parsing commits and applying date cutoff > > name-rev: restructure creating/updating 'struct rev_name' instances > > name-rev: drop name_rev()'s 'generation' and 'distance' parameters > > name-rev: eliminate recursion in name_rev() > > name-rev: cleanup name_ref() > > > > builtin/name-rev.c | 147 +++++++++++++++++++++++++++++--------------- > > t/t6120-describe.sh | 72 +++++++++++++++++----- > > 2 files changed, 153 insertions(+), 66 deletions(-) > > > > Range-diff: > > 1: 673da20e3d ! 1: 8d70ed050d t6120-describe: correct test repo history graph in comment > > @@ t/t6120-describe.sh > > -test_description='test describe > > +test_description='test describe' > > + > > -+# ,---o----o----o-----. > > -+# / D,R e \ > > -+# o--o-----o-------------o---o----x > > -+# \ B / > > -+# `---o----o----o-' > > -+# A c > > ++# o---o-----o----o----o-------o----x > > ++# \ D,R e / > > ++# \---o-------------o-' > > ++# \ B / > > ++# `-o----o----o-' > > ++# A c > > ++# > > ++# First parent of a merge commit is on the same line, second parent below. > > > > - B > > - .--------------o----o----o----x > > 2: 05df899693 = 2: 3720b6859d t6120-describe: modernize the 'check_describe' helper > > 3: 7b0227cfea ! 3: ad2f2eee68 name-rev: use strip_suffix() in get_rev_name() > > @@ > > ## Metadata ## > > -Author: SZEDER Gábor <szeder.dev@xxxxxxxxx> > > +Author: René Scharfe <l.s.r@xxxxxx> > > > > ## Commit message ## > > - name-rev: use strip_suffix() in get_rev_name() > > + name-rev: use strbuf_strip_suffix() in get_rev_name() > > > > - Use strip_suffix() instead of open-coding it, making the code more > > - idiomatic. > > + get_name_rev() basically open-codes strip_suffix() before adding a > > + string to a strbuf. > > > > + Let's use the strbuf right from the beginning, i.e. add the whole > > + string to the strbuf and then use strbuf_strip_suffix(), making the > > + code more idiomatic. > > + > > + [TODO: René's signoff!] > > Signed-off-by: SZEDER Gábor <szeder.dev@xxxxxxxxx> > > > > ## builtin/name-rev.c ## > > @@ builtin/name-rev.c: static const char *get_rev_name(const struct object *o, stru > > - int len = strlen(n->tip_name); > > - if (len > 2 && !strcmp(n->tip_name + len - 2, "^0")) > > - len -= 2; > > -+ size_t len; > > -+ strip_suffix(n->tip_name, "^0", &len); > > strbuf_reset(buf); > > - strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation); > > -+ strbuf_addf(buf, "%.*s~%d", (int) len, n->tip_name, > > -+ n->generation); > > ++ strbuf_addstr(buf, n->tip_name); > > ++ strbuf_strip_suffix(buf, "^0"); > > ++ strbuf_addf(buf, "~%d", n->generation); > > return buf->buf; > > } > > } > > 4: 40faecdc2a = 4: c86a2ae2d0 name-rev: avoid unnecessary cast in name_ref() > > 5: c71df3dadf = 5: 4fc960cc05 name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation > > 6: 1dcb76072f ! 6: 1493cb4484 t6120: add a test to cover inner conditions in 'git name-rev's name_rev() > > @@ Commit message > > looks like this: > > > > if (parent_number > 1) { > > - if (generation > 0) > > - // do stuff #1 > > - else > > - // do stuff #2 > > + if (generation > 0) > > + // branch #1 > > + new_name = ... > > + else > > + // branch #2 > > + new_name = ... > > + name_rev(parent, new_name, ...); > > } else { > > - // do stuff #3 > > + // branch #3 > > + name_rev(...); > > } > > > > These conditions are not covered properly in the test suite. As far > > @@ Commit message > > command's output, because the repository used in that test script > > contains several branches and tags pointing somewhere into the middle > > of the commit DAG, and thus result in a better name for the > > - to-be-named commit. In an early version of this patch series I > > - managed to mess up those conditions (every single one of them at > > - once!), but the whole test suite still passed successfully. > > + to-be-named commit. This can hide bugs: e.g. by replacing the > > + 'new_name' parameter of the first recursive name_rev() call with > > + 'tip_name' (effectively making both branch #1 and #2 a noop) 'git > > + name-rev --all' shows thousands of bogus names in the Git repository, > > + but the whole test suite still passes successfully. In an early > > + version of a later patch in this series I managed to mess up all three > > + branches (at once!), but the test suite still passed. > > > > So add a new test case that operates on the following history: > > > > - -----------master > > - / / > > - A----------M2 > > - \ / > > - \---M1-C > > - \ / > > - B > > + A--------------master > > + \ / > > + \----------M2 > > + \ / > > + \---M1-C > > + \ / > > + B > > > > - and names the commit 'B', where: > > + and names the commit 'B' to make sure that all three branches are > > + crucial to determine 'B's name: > > > > - - The merge commit at master makes sure that the 'do stuff #3' > > - affects the final name. > > + - There is only a single ref, so all names are based on 'master', > > + without any undesired interference from other refs. > > > > - - The merge commit M2 make sure that the 'do stuff #1' part > > - affects the final name. > > + - Each time name_rev() follows the second parent of a merge commit, > > + it appends "^2" to the name. Following 'master's second parent > > + right at the start ensures that all commits on the ancestry path > > + from 'master' to 'B' have a different base name from the original > > + 'tip_name' of the very first name_rev() invocation. Currently, > > + while name_rev() is recursive, it doesn't matter, but it will be > > + necessary to properly cover all three branches after the recursion > > + is eliminated later in this series. > > > > - - And M1 makes sure that the 'do stuff #2' part affects the final > > - name. > > + - Following 'M2's second parent makes sure that branch #2 (i.e. when > > + 'generation = 0') affects 'B's name. > > + > > + - Following the only parent of the non-merge commit 'C' ensures that > > + branch #3 affects 'B's name, and that it increments 'generation'. > > + > > + - Coming from 'C' 'generation' is 1, thus following 'M1's second > > + parent makes sure that branch #1 affects 'B's name. > > > > Signed-off-by: SZEDER Gábor <szeder.dev@xxxxxxxxx> > > > > ## t/t6120-describe.sh ## > > -@@ t/t6120-describe.sh: test_expect_success 'describe complains about missing object' ' > > - test_must_fail git describe $ZERO_OID > > +@@ t/t6120-describe.sh: test_expect_success 'name-rev a rev shortly after epoch' ' > > + test_cmp expect actual > > ' > > > > -+# -----------master > > -+# / / > > -+# A----------M2 > > -+# \ / > > -+# \---M1-C > > -+# \ / > > -+# B > > -+test_expect_success 'test' ' > > ++# A--------------master > > ++# \ / > > ++# \----------M2 > > ++# \ / > > ++# \---M1-C > > ++# \ / > > ++# B > > ++test_expect_success 'name-rev covers all conditions while looking at parents' ' > > + git init repo && > > + ( > > + cd repo && > > @@ t/t6120-describe.sh: test_expect_success 'describe complains about missing objec > > + git checkout master && > > + git merge --no-ff HEAD@{1} && > > + > > -+ git log --graph --oneline && > > -+ > > + echo "$B master^2^2~1^2" >expect && > > + git name-rev $B >actual && > > + > > 7: bdd8378b06 = 7: fc842e578b name-rev: extract creating/updating a 'struct name_rev' into a helper > > 8: ce21c351f9 ! 8: 7f182503e2 name-rev: pull out deref handling from the recursion > > @@ Commit message > > Extract this condition from the recursion into name_rev()'s caller and > > drop the function's 'deref' parameter. This makes eliminating the > > recursion a bit easier to follow, and it will be moved back into > > - name_rev() after the recursion is elminated. > > + name_rev() after the recursion is eliminated. > > > > Furthermore, drop the condition that die()s when both 'deref' and > > 'generation' are non-null (which should have been a BUG() to begin > > @@ Commit message > > > > Note that this change reintroduces the memory leak that was plugged in > > in commit 5308224633 (name-rev: avoid leaking memory in the `deref` > > - case, 2017-05-04), but a later patch in this series will plug it in > > - again. > > + case, 2017-05-04), but a later patch (name-rev: restructure > > + creating/updating 'struct rev_name' instances) in this series will > > + plug it in again. > > > > Signed-off-by: SZEDER Gábor <szeder.dev@xxxxxxxxx> > > > > 9: c8acc6b597 ! 9: 0cdd40b75b name-rev: restructure parsing commits and applying date cutoff > > @@ Commit message > > name_rev() and name_rev() itself as it iterates over the parent > > commits. > > > > - This makes eliminating the recursion a bit easier to follow, and it > > - will be moved back to name_rev() after the recursion is eliminated. > > + This makes eliminating the recursion a bit easier to follow, and the > > + condition moved to name_ref() will be moved back to name_rev() after > > + the recursion is eliminated. > > > > Signed-off-by: SZEDER Gábor <szeder.dev@xxxxxxxxx> > > > > 10: c731f27158 ! 10: e1733e3c56 name-rev: restructure creating/updating 'struct rev_name' instances > > @@ Commit message > > At the beginning of the recursive name_rev() function it creates a new > > 'struct rev_name' instance for each previously unvisited commit or, if > > this visit results in better name for an already visited commit, then > > - updates the 'struct rev_name' instance attached to to the commit, or > > + updates the 'struct rev_name' instance attached to the commit, or > > returns early. > > > > Restructure this so it's caller creates or updates the 'struct > > @@ Commit message > > parameter, i.e. both name_ref() before calling name_rev() and > > name_rev() itself as it iterates over the parent commits. > > > > - This makes eliminating the recursion a bit easier to follow, and it > > - will be moved back to name_rev() after the recursion is eliminated. > > + This makes eliminating the recursion a bit easier to follow, and the > > + condition moved to name_ref() will be moved back to name_rev() after > > + the recursion is eliminated. > > > > This change also plugs the memory leak that was temporarily unplugged > > in the earlier "name-rev: pull out deref handling from the recursion" > > 11: ba14bde230 ! 11: bd6e2e6d87 name-rev: drop name_rev()'s 'generation' and 'distance' parameters > > @@ Commit message > > 'taggerdate' and 'from_tag' parameters as well, but those parameters > > will be necessary later, after the recursion is eliminated. > > > > - Drop name_rev()'s 'generation' and 'distance' parameters. > > + Signed-off-by: SZEDER Gábor <szeder.dev@xxxxxxxxx> > > > > ## builtin/name-rev.c ## > > @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit *commit, > > 12: 2d03ac11f3 ! 12: 0cf63c6d64 name-rev: eliminate recursion in name_rev() > > @@ Commit message > > segfault when processing a deep history if it exhausts the available > > stack space. E.g. running 'git name-rev --all' and 'git name-rev > > HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories > > - results in segfaults on my machine. > > + results in segfaults on my machine ('ulimit -s' reports 8192kB of > > + stack size limit), and nowadays the former segfaults in the Linux repo > > + as well (it reached the necessasry depth sometime between v5.3-rc4 and > > + -rc5). > > > > Eliminate the recursion by inserting the interesting parents into a > > - 'commit_list' and iteratating until the list becomes empty. > > + LIFO 'prio_queue' [1] and iterating until the queue becomes empty. > > > > - Note that the order in which the parent commits are added to that list > > - is important: they must be inserted at the beginning of the list, and > > - their relative order must be kept as well, because otherwise > > - performance suffers. > > + Note that the parent commits must be added in reverse order to the > > + LIFO 'prio_queue', so their relative order is preserved during > > + processing, i.e. the first parent should come out first from the > > + queue, because otherwise performance greatly suffers on mergy > > + histories [2]. > > > > The stacksize-limited test 'name-rev works in a deep repo' in > > 't6120-describe.sh' demonstrated this issue and expected failure. Now > > - the recursion is gone, so flip it to expect success. > > - > > - Also gone are the dmesg entries logging the segfault of the git > > - process on every execution of the test suite. > > - > > - Unfortunately, eliminating the recursion comes with a performance > > - penaly: 'git name-rev --all' tends to be between 15-20% slower than > > - before. > > + the recursion is gone, so flip it to expect success. Also gone are > > + the dmesg entries logging the segfault of that segfaulting 'git > > + name-rev' process on every execution of the test suite. > > > > Note that this slightly changes the order of lines in the output of > > 'git name-rev --all', usually swapping two lines every 35 lines in > > @@ Commit message > > > > This patch is best viewed with '--ignore-all-space'. > > > > + [1] Early versions of this patch used a 'commit_list', resulting in > > + ~15% performance penalty for 'git name-rev --all' in 'linux.git', > > + presumably because of the memory allocation and release for each > > + insertion and removal. Using a LIFO 'prio_queue' has basically no > > + effect on performance. > > + > > + [2] We prefer shorter names, i.e. 'v0.1~234' is preferred over > > + 'v0.1^2~5', meaning that usually following the first parent of a > > + merge results in the best name for its ancestors. So when later > > + we follow the remaining parent(s) of a merge, and reach an already > > + named commit, then we usually find that we can't give that commit > > + a better name, and thus we don't have to visit any of its > > + ancestors again. > > + > > + OTOH, if we were to follow the Nth parent of the merge first, then > > + the name of all its ancestors would include a corresponding '^N'. > > + Those are not the best names for those commits, so when later we > > + reach an already named commit following the first parent of that > > + merge, then we would have to update the name of that commit and > > + the names of all of its ancestors as well. Consequently, we would > > + have to visit many commits several times, resulting in a > > + significant slowdown. > > + > > Signed-off-by: SZEDER Gábor <szeder.dev@xxxxxxxxx> > > > > ## builtin/name-rev.c ## > > +@@ > > + #include "tag.h" > > + #include "refs.h" > > + #include "parse-options.h" > > ++#include "prio-queue.h" > > + #include "sha1-lookup.h" > > + #include "commit-slab.h" > > + > > @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit *commit, > > return NULL; > > } > > @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit > > - parse_commit(parent); > > - if (parent->date < cutoff) > > - continue; > > -+ struct commit_list *list = NULL; > > ++ struct prio_queue queue; > > ++ struct commit *commit; > > ++ struct commit **parents_to_queue = NULL; > > ++ size_t parents_to_queue_nr, parents_to_queue_alloc = 0; > > + > > -+ commit_list_insert(start_commit, &list); > > ++ memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */ > > ++ prio_queue_put(&queue, start_commit); > > + > > -+ while (list) { > > -+ struct commit *commit = pop_commit(&list); > > ++ while ((commit = prio_queue_get(&queue))) { > > + struct rev_name *name = get_commit_rev_name(commit); > > -+ struct commit_list *parents, *new_parents = NULL; > > -+ struct commit_list **last_new_parent = &new_parents; > > ++ struct commit_list *parents; > > + int parent_number = 1; > > + > > ++ parents_to_queue_nr = 0; > > ++ > > + for (parents = commit->parents; > > + parents; > > + parents = parents->next, parent_number++) { > > @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit > > - distance = name->distance + 1; > > + if (create_or_update_name(parent, new_name, taggerdate, > > + generation, distance, > > -+ from_tag)) > > -+ last_new_parent = commit_list_append(parent, > > -+ last_new_parent); > > ++ from_tag)) { > > ++ ALLOC_GROW(parents_to_queue, > > ++ parents_to_queue_nr + 1, > > ++ parents_to_queue_alloc); > > ++ parents_to_queue[parents_to_queue_nr] = parent; > > ++ parents_to_queue_nr++; > > ++ } > > } > > > > - if (create_or_update_name(parent, new_name, taggerdate, > > - generation, distance, > > - from_tag)) > > - name_rev(parent, new_name, taggerdate, from_tag); > > -+ *last_new_parent = list; > > -+ list = new_parents; > > ++ /* The first parent must come out first from the prio_queue */ > > ++ while (parents_to_queue_nr) > > ++ prio_queue_put(&queue, > > ++ parents_to_queue[--parents_to_queue_nr]); > > } > > ++ > > ++ clear_prio_queue(&queue); > > ++ free(parents_to_queue); > > } > > > > + static int subpath_matches(const char *path, const char *filter) > > > > ## t/t6120-describe.sh ## > > @@ t/t6120-describe.sh: test_expect_success 'describe tag object' ' > > 13: 1ef69550ca ! 13: 316f7af43c name-rev: cleanup name_ref() > > @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit > > - int from_tag) > > + int from_tag, int deref) > > { > > - struct commit_list *list = NULL; > > + struct prio_queue queue; > > + struct commit *commit; > > + struct commit **parents_to_queue = NULL; > > + size_t parents_to_queue_nr, parents_to_queue_alloc = 0; > > + char *to_free = NULL; > > + > > + parse_commit(start_commit); > > @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit > > + return; > > + } > > > > - commit_list_insert(start_commit, &list); > > - > > + memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */ > > + prio_queue_put(&queue, start_commit); > > @@ builtin/name-rev.c: static int name_ref(const char *path, const struct object_id *oid, int flags, vo > > if (taggerdate == TIME_MAX) > > taggerdate = commit->date; > > 14: 9d513b3092 < -: ---------- name-rev: plug a memory leak in name_rev() > > 15: 8489abb62e < -: ---------- name-rev: plug a memory leak in name_rev() in the deref case > > -- > > 2.24.0.388.gde53c094ea > > > >