Re: [PATCH] Quote LF in urls git fetch saves in FETCH_HEAD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Daniel Barkalow <barkalow@xxxxxxxxxxxx> writes:

> On Wed, 13 May 2009, Alex Riesen wrote:
>
>> 2009/5/13 Junio C Hamano <gitster@xxxxxxxxx>:
>> > Alex Riesen <raa.lkml@xxxxxxxxx> writes:
>> >
>> >> +             for (i = 0; i < url_len; ++i)
>> >> +                     if ('\n' == url[i])
>> >> +                             fputs("\\n", fp);
>> >> +                     else
>> >> +                             fputc(url[i], fp);
>> >> +             fputc('\n', fp);
>> >
>> > This ad-hoc quoting feels _very_ wrong.  Who is on the reading side and
>> > how does it unquote?
>> 
>> git fmt-merge-msg. It does not unquote. The url is purely informational here.
>> OTOH, the \n shouldn't be in url text at all, so treat it as slightly
>> less annoying
>> warning.
>> 
>> > If it is just informational use only, then it might make more sense to
>> > drop this ugly "quoted \n" silently.  I dunno.
>> 
>> That'd mean to loose the information completely. Which is just as bad
>> as putting the LF in the url in the first place.
>
> Looking back at the original message, it looks like the user included a 
> newline in an argument to clone, and the fetch must have stripped it out 
> (or ignored it in some other fashion), because data was retrieved from a 
> repository that doesn't have a newline in its name. Most likely, the 
> newline should just be prohibited in the URL in the config file in the 
> first place, and we shouldn't be able to get to the point of writing a 
> FETCH_HEAD with that value.

Let me attempt to summarize the situation.

 . FETCH_HEAD is a LF terminated sequence of records; each record
   describes the commit object name, if it is meant to be merged, what URL
   and what ref it came from;

 . "git pull" reads from FETCH_HEAD to decide which commit objects to pass
   to underlying "git merge", and "git fmt-merge-msg" reads from
   FETCH_HEAD to decide what message to produce; and

 . "git fetch" allows a URL with an LF in it and fetches from such a URL
   just fine, but "git pull" barfed because "git fmt-patch" noticed.

Because we copy the LF in the URL straight to FETCH_HEAD, it breaks the
file format.  Alex's proposal is to quote it as "\\n" to avoid it (I
suggested stripping it).  Either change will _fix_ the situation in the
sense that the file format will now be correct, "git pull" will extract
the right set of commits (as it does not look at the URL at all when
deciding which records are used for merge), and "git fmt-merge-msg" will
generate correct set of branches and list of incoming commits without
parse errors, even though the user won't be able to cut and paste the URL
it records in its output either way (if stripped, it will be "information
loss", but even if it were quoted, it won't be usable because nobody
unquotes it).

Given all of the above, I think:

 (1) Alex's patch is a good minimum fix to the issue [*1*].  If unfixed,
     people will be able to fetch from but won't be able to pull from a
     URL with a LF in it.

 (2) Even though it seems that we do _support_ fetching from such a URL,
     it is not a good thing.  People may do all sorts of crazy things, and
     this may be one of these crazy things that we _could_ break the
     backward compatibility to avoid innocent mistakes, by forbidding LF
     in URLs.  Nobody could have been using such a URL in real settings
     because of the issue Alex's patch addresses anyway.

 (3) But forbidding or warning will be done by new code.  Is the cost to
     do such a check (implementation and maintenance of the new code)
     justifiable?  Where do we check and when?

So for now, I'd say I take Alex's patch as-is, and do nothing else.

If somebody has a compelling example that this kind of mistake is common
and is hard to figure out why, we can explicitly forbid certain byte
values in the repository URL as a separate step.


[Footnote]

*1* The output is purely informational, and especially its URL field is
not meant to be used as cut-and-paste source by general public anyway
(e.g. you may pull using git over SSH but general public may not have an
SSH access to the repository).  I actually was tempted, when I did the
initial version of fmt-merge-msg, to shorten an overlong URL with
ellipses, even though I didn't.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]