Re: [PATCH] gitweb: Support comparing blobs (files) with different names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 28, 2007 at 23:03 +0200, Martin Koegler wrote:
> On Wed, Mar 28, 2007 at 12:58:54AM +0100, Jakub Narebski wrote:
>> On Thu, Mar 27, 2007, Martin Koegler wrote:
>>> On Tue, Mar 27, 2007 at 01:56:24AM +0100, Jakub Narebski wrote:

>>> If you think, its safe, I can simplify git_blobdiff. I propose
>>> doing the following way (pseudo-code):
>> 
>>> $file_parent ||= $file_name;
>> [...]
>>> $hash=git_get_hash_by_path($hash_base,$file_name);
>> [...]
>>> $hash_parent=git_get_hash_by_path($hash_parent_base,$file_parent);
>> [...]
>>>  open $fd, "-|", git_cmd(), "diff", '-p', @diff_opts,
>>>     $hash_parent, $hash
>>>     or die_error(undef, "Open git-diff failed");
>> [...]
>>> Else I will keep a reworked version of my patch.
>> 
>> The trouble with this is that we may lose mode change (symlink to
>> ordinary file etc.) because we hand-generate %diffinfo.
> 
> If we need the correct mode in %diffinfo, this is not difficult:
> 
> $from_mode="blob";
> if (defined $hash_base && defined $file_name)
> ($to_mode,$hash)=git_get_hash_by_path($hash_base,$file_name);
[...]
> Then we set to_mode and from_mode in %diffinfo to $to_mode and $from_mode.
> 
> git_get_hash_by_path_mode is like git_get_hash_by_path, only that it
> returns an array with the mode and the hash.

I'd rather either have git_get_ls_tree_line_by_path, which we would then
parse using parse_ls_tree_line, or have git_get_info_by_path which would
return already parsed information (as hash reference, or as a list).

>>>> By the way, if you call git_get_hash_by_path (which is expensive, as it
>>>> calls git command), you can use resulting hash in place of
>>>> hash_base:filename as an argument to git-diff.
>>>  
>>> I must check, if we need to resolve $hash ($hash_parent) by
>>> git_get_hash_by_path, if we construct it out of $hash_base and
>>> $file_name. Maybe we can avoid this call.
>> 
>> We can use "$hash_base:$file_name" as second parameter to git-diff etc.,
>> but I don't think we want to create links with "$hash_base:$file_name"
>> instead of SHA-1 id (hash) of a blob as 'h' parameter.
>> 
>> It can be first implementation, thought, and later we can try to use
>> "index <hash>..<hash> <mode>" lines from extended header to get $hash
>> and $hash_parent (with exception of pure rename, but then we need only
>> one invocation of git_get_hash_by_path subroutine).
> 
> The <mode> must be discarded, as it is wrong for anything which as not
> a mode of 100644, as we specify a two blob as parameter to git-diff.

The <mode> information might be discarded, or might be saved. "blob"
can have 100644 mode, can have 100755 mode (executable file), or can
have 120000 mode (symbolic link).
 
>> But I think it is better left for later patch.
> 
> As git_patchset_body requires an information about the compared file
> as parameter, a new formating function will be needed. I'm not sure,
> if the overhead of git_get_hash_by_path is this worth. Additionally,
> if we have the hash passed by parameter in most cases, there is no
> need to call it in these cases.

git_patchset_body currently buffers (caches) diff header (up to 
from-file / to-file header) to catch situation where one "raw" format
line (one difftree line) correspond to two patches (like e.g. type change
situation below, from ordinary file to symlink or vice versa, and which
_should_ be handled by git_patchset_body). So I think it would not be
hard to parse extended diff header and fill values which are not present
in %diffinfo from extended diff header. This includes filling $from_hash
and $to_hash ($hash_parent and $hash) information from the 
"index <hash>..<hash> <mode>" or "index <hash>..<hash>" extended diff
header line.

The (only?) rare exception is when files (blobs) does _not_ differ, as
patch in this situation is empty, and we would have to get hash of blob
(which would be the same for from and to, for $hash_parent and $hash)
using git_get_hash_by_name or new git_get_info_by_name.

> Blobdiff (html output) in its current version can not handle symlinks:

I'd investigate that. Is "commitdiff" view correct in this situation?

>> diff --git a/x b/x
>> deleted file mode 100644 (file)
>> index 190a180..873fb8d
>> --- a/x
>> +++ /dev/null
>> @@ -1 +0,0 @@
>> -123
>> diff --git a/ b/
>> new file mode 120000 (symlink)
>> index 190a180..873fb8d
>> --- /dev/null
>> +++ b/
>> @@ -0,0 +1 @@
>> +file3
>> \ No newline at end of file
> 
> This was generated by "diff to current" in the history view of a file,
> which was changed between symlink and normal file.


> Additionally, to_mode and from_mode of %diffinfo seem to be ignored by
> git_patchset_body.

As it should, I think.

> mfg Martin Kögler
> PS:
> I created a blob with a "strange" filename: &()=!"§$%[<>]*#+_;.
> In the result of the blob view, the " is not escaped in the filename in the header
> and a strange content type is returned:
> 
> $ telnet localhost 80
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> GET /gitweb/gitweb.cgi?p=t/.git;a=blob;f=%26()%3D%21%22%A7%24%25%5B%3C%3E%5D%2A%23%2B_%3B.;hb=7bfed2588bee66b33db544830606fa6606478fd9 HTTP/1.0
> 
> HTTP/1.1 200 OK
> Date: Wed, 28 Mar 2007 19:55:36 GMT
> Server: Apache
> Content-disposition: inline; filename="&()=!"§$%[<>]*#+_;."
> Expires: Thu, 29 Mar 2007 19:55:39 GMT
> Connection: close
> Content-Type: application/vnd.mif
> 
> xx

I'd try to check that

-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]