Re: [PATCH] gitweb: filter escapes from longer commit titles that break firefox

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 24 April 2009, Paul Gortmaker wrote:
> [Re: [PATCH] gitweb: filter escapes from longer commit titles that break firefox]
> On 24/04/2009 (Fri 19:53) Jakub Narebski wrote:  
>> On Mon, 20 April 2009, Paul Gortmaker wrote:
>>> Jakub Narebski wrote:
>>>> Paul Gortmaker <paul.gortmaker@xxxxxxxxxxxxx> writes:
>>>>
>>>>   
>>>>> If there is a commit that ends in ^X and is longer in length than
>>>>> what will fit in title_short, then it doesn't get fed through
>>>>> esc_html() and so the ^X will appear as-is in the page source.
>>>>>
>>>>> When Firefox comes across this, it will fail to display the page,
>>>>> and only display a couple lines of error messages that read like:
>>>>>
>>>>>    XML Parsing Error: not well-formed
>>>>>    Location: http://git ....
>>>>>
>>>>> Signed-off-by: Paul Gortmaker <paul.gortmaker@xxxxxxxxxxxxx>

>>>> And I think that issue might be a bug elsewhere in gitweb if we have
>>>> text output which is not passed through esc_html... or bug in CGI.pm
>>>> if the error is in not escaping of -title _attribute_ (attribute
>>>> escaping has slightly different rules than escaping HTML, and should
>>>> be done automatically by CGI.pm).

>> What more important is: can you find out in more detail _where_
>> an error (unescaped control character) occurs: is it tag contents or
>> 'title' attribute for some tag, what tag is it (name and class), in
>> what view or views this bug is present, and in which part this occur?
>> Without those details it would b much harder to diagnose this bug...
> 
> No problem -- It appears to be in the title attribute, and it appears
> straight away when I go to the toplevel view of the repo, assuming that
> the commit is within the top 10 recent commits that are shown on the
> summary page.  I've put more details below on how I can reproduce it
> and the page source deltas -- hopefully this will help.  If there is
> something else I can provide that would help, don't hesitate to ask.

Ahh... that is what I thought.

The problem that we have to solve to fix this bug is twofold:

 * CGI.pm does by default slight escaping (simple_escape from CGI::Util)
   of _attribute_ values, but for obvious reasons it cannot do
   unconditional escaping of tag _contents_ (because it can be HTML
   itself).

   This escaping, at least in CGI.pm version 3.10 (most current version
   at CPAN is 3.43), is minimal: only '"', '&', '<' and '>' are escaped
   using named HTML entity references (&quot;, &amp;, &lt; and &gt;
   respectively).  simple_escape does not do escaping of control
   characters such as ^X which are invalid in XHTML (in strict mode).
   Note that IIRC escaping '<' and '>' in attributes is not strictly
   necessary.

   Gitweb relies on the fact that CGI.pm does escaping of attribute
   values.  We cannot escape attributes (e.g. "title" attribute with
   (almost) full commit subject) as it is now, because it would lead
   to double escaping.  Fortunately it is possible to turn off
   autoescaping by using $cgi->autoEscape(undef); note however that
   we would have to do attribute escaping by ourself in the scope of
   this declaration.

 * Rules for escaping attribute values are slightly different for rules
   for escaping HTML.  For attribute values we have to escape '"'
   because it is attribute delimiter, and '&' because it is escape
   character; escaping '<' and '>' is not strictly necessary.  For
   escaping HTML we need to escape '<' and '>' because they introduce
   tags, and '&' because it is escape character; escaping '"' is not
   strictly necessary.  It does not make sense to replace spaces by
   &nbsp; in attribute values, although it shouldn't harm.  OTOH we
   should perhaps escape newlines in attribute values.

   For esc_html and esc_path we replace (currently) control characters
   by character escape codes (e.g. "\f" for form-feed, "\0" for NUL,
   hexadecimal escapes for 'other' control characters).  But it is not
   the only possible solution.  We can use Unicode printable
   representation of control characters instead (0x2400 sheet).  Or we
   can use control key sequence / caret notation e.g. ^X for \0x18,
   or ^L for "\f" there.  We probably should discus this in more detail.

So it is not that simple...


P.S. The subject (one line summary of this change) should be also
changed to for example "gitweb: escape control characters in attributes"
and in commit message itself you should explain that control characters
break rendering in Firefox in strict XML compliance mode... or something
like that.
-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]