Hi Sangeeta
On 09/10/2020 08:41, Sangeeta NB wrote:
Thanks for the explanation, Philips. I think there's a long road ahead
to understand how everything is implemented and put together.
Coming to the microproject, it was said that there is an inconsistency
in --dirty behavior shown by `git diff` and `git describe --dirty` for
submodule state when the files are untracked.
I struggled to find the mircoprojects page - I must have missed the link
on the outreachy site. In case anyone else is struggling to find it here
is the project
Unify the meaning of -dirty between diff and describe
git diff reports a submodule directory, whose tracked
contents match the commit at the HEAD in the submodule, as
-dirty when there is an untracked file in the submodule
directory. This is inconsistent with what git describe
--dirty says when run in the submodule directory in that
state. [1]
[1] https://lore.kernel.org/git/xmqqo8m1k542.fsf@xxxxxxxxxxxxxxxxxxxxxx/
From what I understood by looking at the code, the diff files states
that we should ignore untracked submodule states. So is it that I have
to make changes in the way git describe is implemented by ignoring the
changes in the untracked submodule?
As I understand it if a submodule contains any untracked files (i.e. a
file that has not been added with `git add` and is not ignored by any
.gitignore or .git/info/exclude entries) then running `git diff` in the
superproject will report that the submodule is dirty - there will be a
line something like "+Subproject commit abcdef-dirty". However if we run
`git describe --dirty` in the submodule directory then it will not
append "-dirty" to it's output unless there are changes to tracked files.
Also, I wasn't able to look for this inconsistency in my local
machine. Any pointers on how to reproduce this might be helpful.
I'd start my trying to build git and running t4027-diff-submodule.sh.
If you look at the start of the test 'git diff HEAD with dirty submodule
(untracked)' in t/t4027-diff-submodule.sh it sets up a submodule with an
untracked file. If you add "test_pause &&" after the diff command in
that test it will start a shell in the test directory and you can run
`git diff HEAD` yourself to see the output and also `git -C sub diff
HEAD` which will run diff in the submodule directory. The latter command
should show that there are no changes in the tracked files of the
submodule. Just exit the shell to get the test to continue. (you can see
in builtin/describe.c that when it is run with `--dirty` it runs `git
diff-index HEAD` to determine if a repository is dirty). To change the
output of diff I would look for the string "Subproject commit" in diff.c
to find the code that adds '-dirty' and try working backwards from
there. Let me know if you get stuck - it took we a while to work
backwards to find where we check if the submodule is dirty.
Best Wishes
Phillip
Thanks and regards,
Sangeeta
On Thu, Oct 8, 2020 at 2:37 PM Phillip Wood <phillip.wood123@xxxxxxxxx> wrote:
Hi Sangeeta
On 07/10/2020 21:10, Sangeeta NB wrote:
Hello everyone,
Welcome to the list
My name is Sangeeta and I’m one of the Outreachy applicants. I would
like to work on the microproject "Unify the meaning of dirty between
diff and describe".
While looking at the files for `describe` and `diff` commands I found
that the `describe.c` is present in builtin[1] folder whereas diff.c
is found in the root[2] folder as well as builtin[3] folder. I could
not find any implementation of --dirty in the diff.c present in
builtin[3] folder. So is it that I have to compare the implementation
of describe.c[1] and diff.c(of root folder)?
Also, I was curious to know why is there a builtin folder when many
commands described in that are described again in the root folder?
The files in the root directory are (mostly) library code that ends up
in libgit.a. The builtin directory contains the individual git commands
that form the git binary that is linked with libgit.a. builtin/diff.c
contains cmd_diff() which will be called when the user runs `git diff`.
That function parses the command line options and sets up the necessary
data to pass to the diff implementation in /diff.c. The diff and log
family of commands are a bit different to most of the other commands in
that the option parsing is mostly done by calling setup_revisions() in
/revision.c rather than using the option parsing library routines in
/parse-options.c directly. I think the `--dirty` option for diff ends up
being handled by handle_ignore_submodules_arg() in submodule.c, I'll
leave it to you to see where that is called from (you can use `git grep`).
I'm going to be off line for the rest of today, hopefully someone else
will be able to help if you get stuck or I'll try and answer any other
questions tomorrow.
Best Wishes
Phillip
Looking forward to working with you all.
Sangeeta
[1] https://github.com/git/git/blob/master/builtin/describe.c
[2] https://github.com/git/git/blob/master/builtin/diff.c
[3] https://github.com/git/git/blob/master/diff.c