Re: gitweb: using quotemeta

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Luben Tuikov wrote:
> --- Jakub Narebski <jnareb@xxxxxxxxx> wrote:
>> Luben Tuikov wrote:
>> 
>>> --- Junio C Hamano <junkio@xxxxxxx> wrote:
>>>> 
>>>> Ouch, that was a sloppy planning and coding, and sloppier
>>>> reviewing.  Sorry.
>>>> 
>>>> What is the right quoting there?  Just quoting double-quotes?
>>> 
>>> I'm not sure.  What undesired character could we have in $filename
>>> of a snapshot?  The commit ab41dfbfd4f message gives this
>>> justification: "Just in case filename contains end of line character."
>>> 
>>> It looks like $filename is constructed by well defined strings:
>>> basename($project), $hash and $suffix all of which should be ok.
>>> 
>>> I'd say we don't need quotemeta for $filename of snapshot.
>> 
>> But we do need quoting for blob_plain and perhaps blobdiff_plain
>> views, although not quotemeta, but perhaps the reverse of unescape,
>> i.e. quote '"', EOLN (end of line) and perhaps also TAB.
> 
> Escaping for the purposes of HTML _view_ and URL generation is ok,
> but it is not ok when _saving_ the file with a file name.
> 
> A file name is just a string of chars, and I want to _save_ the file
> name as its name is. No changes or interpretations please.  I don't
> care what the string is, what chars it is made of, etc.

We need to _escape_ HTML output (using esc_html subroutine) because
HTML treats some characters specially, and we need to escape them
to turn off this interpretation. Examples include SPC (for example
file name which has two consecutive spaces) and ampersand '&' which
might be treated as entity reference.

Sometimes we want to quote filename for view, for example to fit it
in one line, and to distinguish between tab and spaces.


But you forget that in HTTP headers, to be more exact in
	Content-Disposition: inline; filename="<filename>"
header, the quote '"' and end-of-line '\n' characters in <filename>
are treated specially. So you need to quote somehow at least those
two characters.


<checks the RFC>

RFC 2616 "Hypertext Transfer Protocol -- HTTP/1.1":
   RFC 1806 [35], from which the often implemented Content-Disposition
   (see section 19.5.1) header in HTTP is derived, has a number of very
   serious security considerations. Content-Disposition is not part of
   the HTTP standard, but since it is widely implemented, we are
   documenting its use and risks for implementors. See RFC 2183 [49]
   (which updates RFC 1806) for details.

RFC 2183 "Communicating Presentation Information in Internet Messages:
          The Content-Disposition Header Field":
2.3  The Filename Parameter

   The sender may want to suggest a filename to be used if the entity is
   detached and stored in a separate file. If the receiving MUA writes
   the entity to a file, the suggested filename should be used as a
   basis for the actual filename, where possible.

   It is important that the receiving MUA not blindly use the suggested
   filename.  The suggested filename SHOULD be checked (and possibly
   changed) to see that it conforms to local filesystem conventions,
   does not overwrite an existing file, and does not present a security
   problem (see Security Considerations below).

   The receiving MUA SHOULD NOT respect any directory path information
   that may seem to be present in the filename parameter.  The filename
   should be treated as a terminal component only.  Portable
   specification of directory paths might possibly be done in the future
   via a separate Content-Disposition parameter, but no provision is
   made for it in this draft.

   Current [RFC 2045] grammar restricts parameter values (and hence
   Content-Disposition filenames) to US-ASCII.  We recognize the great
   desirability of allowing arbitrary character sets in filenames, but
   it is beyond the scope of this document to define the necessary
   mechanisms.  We expect that the basic [RFC 1521] `value'
   specification will someday be amended to allow use of non-US-ASCII
   characters, at which time the same mechanism should be used in the
   Content-Disposition filename parameter.

> Please don't interpret file names and their characters when the files
> are _saved_ by the user's browser.
>
> The file name in my filesystem should be the exact same file name
> as it appears on any other filesystem hosting the same git repo.

As you can see from the above RFC, there is no standard way to pass
suggested filename to be exactly the same as in git repository, if
it contains non US-ASCII characters. At least not for _HTTP_ 
Content-Disposition: header; for email we can use quoted-printable
from RFC 2047 to encode non US-ASCII characters (including " and EOLN)
with good probability of this working correctly.
 
> I don't want this translation:
> Server FS: linux-2.6.git-5c2d97cb31fb77981797fec46230ca005b865799.tar.gz
> Quotemeta: linux\-2\.6\.git\-5c2d97cb31fb77981797fec46230ca005b865799\.tar\.gz
> User FS: linux\-2\.6\.git\-5c2d97cb31fb77981797fec46230ca005b865799\.tar\.gz
> 
> When you comitted ab41dfbfd4f3f9fedac71550027e9813b11abe3d, it extended
> quotemeta to where it shouldn't have been applied.

I only followed (I admit blindly, without checking the details, and
I agree wrongly) the commit a2f3db2f5de2a3667b0e038aa65e3e097e642e7d
   "gitweb: Consolidate escaping/validation of query string"
by Petr "Pasky" Baudis, which among others included the following patch
(which part of commit is not documented enough in commit message):

@@ -3126,7 +3116,7 @@ sub git_blobdiff {
 			-type => 'text/plain',
 			-charset => 'utf-8',
 			-expires => $expires,
-			-content_disposition => qq(inline; filename="${file_name}.patch"));
+			-content_disposition => qq(inline; filename=") . quotemeta($file_name) . qq(.patch"));
 
 		print "X-Git-Url: " . $cgi->self_url() . "\n\n";
 

And I agree that your
   "gitweb: Don't use quotemeta on internally generated strings"
is (partially) correct.

>     Luben
> P.S. When replying please don't redact the CC field.

When I reply by email, I have no problem (I have to add git
mailing list if I forget to use reply to all). When I reply
via GMane NNTP interface, KNode by default replies only to
newsgroup (coupled with git mailing list).
-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]