Re: 'git status' is not read-only fs friendly

Junio C Hamano <junkio@xxxxxxx> · Sat, 10 Feb 2007 22:33:46 -0800

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Sat, 10 Feb 2007, Nicolas Pitre wrote:
>> > >
>> > > Because git-status itself is conceptually a read-only operation, and 
>> > > having it barf on a read-only file system is justifiably a bug.
>> > 
>> > I do not 100% agree that it is conceptually a read-only operation.
>> 
>> It is.
>
> It really isn't. 
>
> It's not even a "technical issue". It's a fundamental optimization. Sure, 
> you can call optimizations just "technical issues", but the fact is, it's
> one of the things that makes git so _usable_ on large archives. At some 
> point, an "optimization" is no longer just about making things slightly 
> faster, it's about something much bigger, and has real semantic meaning.
> ...
> THIS IS NOT "JUST A TECHNICAL ISSUE". 
> ...
> And the index is what makes it so. 
>
> And that's why it's important to keep the index up-to-date.

I think a one paragraph summary of your argument is:

 - index is a good thing -- it is what makes the difference
   between usable and unusable.

 - git-status needs to refresh the index in order to do its
   thing efficiently and usably _anyway_, so once it spends
   cycles to do so, it is senseless not to write the refreshed
   index out when it can.

I do not think anybody disputes that in a repository with 20k+
paths, it is sensible to leave the index stat-dirty for all
paths.  But I think your example

	read-tree HEAD

misses the point by stressing the importance of index too much.
Index is important for the usability and I do not think anybody
is disputing it.

The thing is, nobody switches the index that way without running
"update-index --refresh" afterwards.  Normal people would use
git-reset to switch to a different tree object, and the command
does that for you.  If you are a hardcore, you would know to use
"read-tree -m HEAD" at least to avoid making paths unnecessarily
stat-dirty.  Your example, while it is valid and demonstrates
why the index is a good thing very well, is simply not part of
a normal workflow and not very relevant when discussing the
performance ramifications of what state "git-status" should
leave the index in.

When I said "calling 'update-index --refresh' in git-status
loses stat-dirtiness information", I was certainly _NOT_ talking
about losing the information that 20k+ paths used to be
stat-dirty because the user did "read-tree HEAD" earlier.

At least for me, it is very normal to do something like this.

 * start from a clean index.

 * edit cache.h, diff.h, and diff-lib.c.

 * stop, think, and realize that my earlier edit to change one
   function prototype in diff.h was not needed, and revert the
   change to that line still in the editor.

 * fix things up further by editing other files.

And then, I would run "git diff" to see where I am.  I still
remember that I touched diff.h and I also remember that I once
changed a function prototype but then decided the change was not
necessary after all, but I do not remember if I changed anything
else in the file.  It is _very_ assuring to see the emptiness
that follows "git diff --git" header for diff.h in such a case.
Seeing the path to be stat-dirty is a very good thing for me,
because otherwise I might lose a few seconds thinking that what
I thought I touched might have been cache.h and not diff.h.

To me, running "git status" is "wrapping things up" step.  I do
not need that stat-dirty assurance "git diff" gave me at that
point.  Not seeing diff.h in "modified but updated" list is a
good thing.  And in my workflow, after that 'wrapping things up"
step, I do not need that stat-dirty assurance _anymore_.

I think Nico is correct to point out that "not _anymore_" part
of the above reasoning of mine assumes _my_ workflow and
preference, and I think that is a valid point.  Not saving the
refreshed index would make the stat-dirtiness for diff.h to come
back, which would be inconvenient and annoying to me.

But the user might want to keep it stat-dirty after running
"git-status".  People in "not _anymore_" camp like me can throw
the stat-dirtiness away by "update-index --refresh".  I do not
think he (or anybody) is advocating to keep 20k+ paths in
stat-dirty state (arguably, "artificially" due to use of
"read-tree HEAD"), so your example using "read-tree HEAD" only
confuses the discussion.

Having said all that, I do agree with you that git-status should
throw that stat-dirtiness information away by saving the
refreshed index.  Doing otherwise is annoying to me as I already
said, and I do not think of a valid reason for the user to want
to keep stat-dirtiness information after running "git-status",
because to me the whole point of running "git-status" is to
start wrapping things up.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html