Re: [PATCH v3] ls-files: introduce "--format" option

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 23 Jun 2022 08:57:58 -0700

Phillip Wood <phillip.wood123@xxxxxxxxx> writes:

> Thanks for re-rolling, having taken a look a closer look at the tests
> I'm concerned about the output format for some of the specifiers, see
> below.

Thanks for raising these issues.  I agree with you on many of them.
In addition to what you covered ....

>> +path::
>> +	The pathname of the file which is in the index.
> I think that for all these it might be clearer to say "recorded in the
> index" rather than "of the file which is in the index"

I think we would call this "name".  The name of the existing option
that controls how they are shown is "--full-name", not "--full-path",
for example.

>> +ctime::
>> +	The create time of file which is in the index.
>
> This is printed with a prefix 'ctime:' (the same applies to the format
> specifiers below) I think we should omit that and just print the data
> so the user can choose the format they want.
>
>> +mtime::
>> +	The modified time of file which is in the index.

These are only the low-bits of the full timestamp, not ctime/mtime
themselves.

But stepping back a bit, why do we need to include them in the
output?  What workflow and use case are we trying to help?  Dump
output from "stat <path>" equivalent from ls-files and compare with
"stat ." output to see which ones are stale?  Or is there any value
to see the value of, say, ctime as an individual data item?

>> +dev::
>> +	The ID of device containing file which is in the index.
>> +ino::
>> +	The inode number of file which is in the index.
>> +uid::
>> +	The user id of file owner which is in the index.
>> +gid::
>> +	The group id of file owner which is in the index.

Again, why do we need to include these in the output?

Wouldn't it be sufficient, as well as a lot more useful, to show a
single bit "the cached stat info matches what is in the working tree
(yes/no)"?

>> +size::
>> +	The size of the file which is in the index.

This needs to explain what kind of size this is.  Is it the size of
the blob object?  Is it the size of the file in the working tree
(i.e. not cleaned)?  Is it _always_ the size, or can it become a
number that is very different from size in certain circumstances?

IOW, I do not think giving this to unsuspecting users and call it
"size of the file" hurts them more than it helps them, especially
because it is not always the size of the file.

I'd suggest getting rid of everything from ctime down to size and if
we really care about the freshness of the cached stat info, replace
them with a single bit "up-to-date".

>> +flags::
>> +	The flags of the file in the index which include
>> +	in-memory only flags and some extended on-disk flags.
>
> If %(flags) is going to be useful then I think we need to think about
> how they are printed and document that. At the moment they are printed 
> as a hexadecimal number which is fine for debugging but probably not
> going to be useful for something like --format. I think printing 
> documented symbolic names with some kind of separator (a comma maybe)
> between them is probably more useful

I am guessing that most of the above are only useful for curious
geeks and those who are debugging their new tweak to the code that
touches the index, i.e. a debugging feature.  But these folks can
run "git" under a debugger, and they probably have to do so when
they are seeing an unexpected value in the flags member of a cache
entry anyway.  So I am not sure whom this field is intended to help.

>> [...]
>> +test_expect_success 'git ls-files --format eol' '
>> +	printf "i/lf    w/lf    attr/                 \t\n" >expect &&
>> +	printf "i/lf    w/lf    attr/                 \t\n" >>expect &&
>> +	git ls-files --format="%(eol)" --eol >actual &&
>
> I'm not sure why this is passing --eol as well as --format='%(eol)' -
> shouldn't that combination of flags be an error?

Good eyes.

Thanks.