Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi... Added Jeff as he got involved later and comments below are
relevant to his questions.

On 2020-09-18 11:10, Thomas Guyot-Sionnest wrote:
> On Fri, 18 Sep 2020 at 10:46, Taylor Blau <me@xxxxxxxxxxxx> wrote:
>>
>>   - Why do we have to do this at all all the way up in
>>     'builtin_diffstat'? I would expect these to contain the right
>>     OIDs by the time they are given back to us from the call to
>>     'diff_fill_oid_info' in 'run_diffstat'.
>>
>> So, my last point is the most important of the three. I'd expect
>> something more along the lines of:
>>
>>   1. diff_fill_oid_info resolve the link to the pipe, and
>>   2. index_path handles the resolved fd.
>>
>> ...but it looks like that is already what it's doing? I'm confused why
>> this doesn't work as-is.
> 
> So the idea is to checksum the data and write a valid oid. I'll see if
> I can figure that out. Thanks for the hint though else I would likely
> have gone with a buffer and memcmp. Your solution seems cleaner, and
> there is a few other uses of oideq's that look dubious at best with
> the case of null oids / buffered data so it's definitely a better
> approach.
> 

After looking further at the code I understand your point, although
pipes can only ever be read once, so even if we do that we would have to
buffer on first read. It appears the files are first read by
diff_populate_filespec() - builtin_diffstat isn't even called if the
files match (even for two pipes).

Jeff, on your suggestion to compare size, the size is set even if data
is null. Files in-tree appears to be mmapped on demand for reads.

diff_fill_oid_info explicitly resets oids for is_stdin and return, and
if the file's been read already and it's a pipe, we would *have* to have
buffered the data already so I don't really see what else we can do
besides memcmp() (technically we should be able to tell if the files
have been modified at this point but apparently that information isn't
transmitted to builtin_diffstat - it's assumed and I won't make complex
change for that odd case of diffing two pipes. That's what I have now:

    /* What is_stdin really means is that the file's content is only
     * in the filespec's buffer and its oid is zero. We can't compare
     * oid's if both are null and we can just diff the buffers */
    if (one->is_stdin && two->is_stdin)
        same_contents = (one->size == two->size ?
            !memcmp(one->data, two->data, one->size) : 0);
    else
        same_contents = oideq(&one->oid, &two->oid);


Even when we implement the --literally switch, considering we can't
guarantee a single read per file, for now I'd keep using the is_stdin
flag as an indication of in-memory data, and we'll have to read in all
pipes we diff (like earlier patch). It could be a concern if we
--literally diff a whole subtree of large pipes. In that case the only
fix I can see is to reorder the operations to generate the stats on each
file diff (or at least keep the diffs around for the stats pass).


Regards,

Thomas





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux