[Bug] In `git-rev-list(1)`, using the `--objects` flag doesn't work well with the `--not` flag, as non-commit objects are not excluded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello!

What did you do before the bug happened? (Steps to reproduce your issue)

Assume a repository structure as follows:

- commit1 9f2aa2eb987c2281bb4901dbccd1398ad2c39722
  - tree: 205f6b799e7d5c2524468ca006a0131aa57ecce7
    - 100644 blob 257cc5642cb1a054f08cc83f2d943e56fd3ebe99    foo
      - content: foo
- commit2 9e02481f4df3a8997335b0a68882580e3b9b588f (parent:
9f2aa2eb987c2281bb4901dbccd1398ad2c39722)
  - tree: 672d0aa883d095369c56416587bc397eee4ac37e
    - 100644 blob 257cc5642cb1a054f08cc83f2d943e56fd3ebe99    foo
      - content: foo
    - 100644 blob eec8c88a93f6ee1515fb8348f2c122cfda4302cd    moo
      - content: moo
- commit3 91fa9611a355db77a07f963c746d57f75af380da (parent:
9e02481f4df3a8997335b0a68882580e3b9b588f)
   - tree 0c16a6cc9eef3fdd3034c1ffe2fc5e6d0bba2192
     - tree 086885f71429e3599c8c903b0e9ed491f6522879    bar
       - 100644 blob 7a67abed5f99fdd3ee203dd137b9818d88b1bafd    goo
         - content: goo
     - 100644 blob 257cc5642cb1a054f08cc83f2d943e56fd3ebe99    foo
       - content: foo
     - 100644 blob eec8c88a93f6ee1515fb8348f2c122cfda4302cd    moo
       - content: moo
     - 100644 blob 8baef1b4abc478178b004d62031cf7fe6db6f903    abc
       - content: abc
- commit4 6b52ed5b176604a0740689b5bb9be7bd79f4bced (parent:
9f2aa2eb987c2281bb4901dbccd1398ad2c39722)
  - tree ff05824d2f76436c61d2c971e11a27514aba6948
    - tree 086885f71429e3599c8c903b0e9ed491f6522879    bar
      - 100644 blob 7a67abed5f99fdd3ee203dd137b9818d88b1bafd    goo
        - content: goo
    - 100644 blob 257cc5642cb1a054f08cc83f2d943e56fd3ebe99    foo
      - content: foo
    - 100644 blob 8baef1b4abc478178b004d62031cf7fe6db6f903    abc
      - content: abc

What did you expect to happen? (Expected behavior)

In such a repository, the output for the command, should have the
output provided below

❯ git rev-list --objects 6b52ed5b176604a0740689b5bb9be7bd79f4bced
--not 91fa9611a355db77a07f963c746d57f75af380da
6b52ed5b176604a0740689b5bb9be7bd79f4bced
ff05824d2f76436c61d2c971e11a27514aba6948

What happened instead? (Actual behavior)

Instead, the output is as follows:

❯ git rev-list --objects 6b52ed5b176604a0740689b5bb9be7bd79f4bced
--not 91fa9611a355db77a07f963c746d57f75af380da
6b52ed5b176604a0740689b5bb9be7bd79f4bced
ff05824d2f76436c61d2c971e11a27514aba6948
8baef1b4abc478178b004d62031cf7fe6db6f903 abc
086885f71429e3599c8c903b0e9ed491f6522879 bar
7a67abed5f99fdd3ee203dd137b9818d88b1bafd bar/goo

What's different between what you expected and what actually happened?

If you notice here, the objects
`8baef1b4abc478178b004d62031cf7fe6db6f903`,
`086885f71429e3599c8c903b0e9ed491f6522879` and
`7a67abed5f99fdd3ee203dd137b9818d88b1bafd` are included in the output,
these objects are reachable from
`91fa9611a355db77a07f963c746d57f75af380da` and shouldn't have been
included since we used the `--not` flag.

Anything else you want to add:

I did some preliminary walkthrough of the code to understand why this
happens, and my understanding is as follows:
1. In rev-list.c: we first set up the revisions provided via the
`setup_revisions()` function. Here, any revision provided under the
`--not` flag is marked as `UNINTERESTING`.
2. In rev-list.c: we then call `prepare_revision_walk()`, this
function internally goes through the commits and calls
`handle_commit()` on each of the commit. In our case
(6b52ed5b176604a0740689b5bb9be7bd79f4bced,
91fa9611a355db77a07f963c746d57f75af380da).
3. In revision.c: In `handle_commit()` we set `revs->limited = 1`
since one of our commits is marked as `UNINTERESTING`.
4. In revision.c: Back in `prepare_revision_walk()`, since
`revs->limited` is set, we call `limit_list()`.
5. In revision.c: Not sure what the purpose of `limit_list()` is, but
seems like it is to optimize the revision walk to reduce the traversal
later on. In our case, we can mark all parents of the commit as
uninteresting and remove the commit from the rev list entirely.
6. In rev-list.c: Finally, when we call
`traverse_commit_list_filtered` for the traversal, we recursively show
commit/object unless we come across something `UNINTERESTING`. Since
only the commits were marked as `UNINTERESTING`, any shared
trees/blobs will still be printed to output.

The diff below fixes the issue:

@@ -3790,11 +3833,12 @@ int prepare_revision_walk(struct rev_info *revs)
         commit_list_sort_by_date(&revs->commits);
     if (revs->no_walk)
         return 0;
-    if (revs->limited) {
+    if (revs->limited && !revs->tree_objects) {
         if (limit_list(revs) < 0)
             return -1;
         if (revs->topo_order)

But this is definitely a very _naive_ fix. Before diving into fixing
this, it would be nice to hear some thoughts on this.

[System Info]
git version:
git version 2.41.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 6.4.9-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Aug  8
21:21:11 UTC 2023 x86_64
compiler info: gnuc: 13.1
libc info: glibc: 2.37
$SHELL (typically, interactive shell): /bin/fish


[Enabled Hooks]
not run from a git repository - no hooks to show

- Karthik




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux