On Fri, 24 April 2009, Paul Gortmaker wrote: > [Re: [PATCH] gitweb: filter escapes from longer commit titles that break firefox] > On 24/04/2009 (Fri 19:53) Jakub Narebski wrote: >> On Mon, 20 April 2009, Paul Gortmaker wrote: >>> Jakub Narebski wrote: >>>> Paul Gortmaker <paul.gortmaker@xxxxxxxxxxxxx> writes: >>>> >>>> >>>>> If there is a commit that ends in ^X and is longer in length than >>>>> what will fit in title_short, then it doesn't get fed through >>>>> esc_html() and so the ^X will appear as-is in the page source. >>>>> >>>>> When Firefox comes across this, it will fail to display the page, >>>>> and only display a couple lines of error messages that read like: >>>>> >>>>> XML Parsing Error: not well-formed >>>>> Location: http://git .... >>>>> >>>>> Signed-off-by: Paul Gortmaker <paul.gortmaker@xxxxxxxxxxxxx> >>>> And I think that issue might be a bug elsewhere in gitweb if we have >>>> text output which is not passed through esc_html... or bug in CGI.pm >>>> if the error is in not escaping of -title _attribute_ (attribute >>>> escaping has slightly different rules than escaping HTML, and should >>>> be done automatically by CGI.pm). >> What more important is: can you find out in more detail _where_ >> an error (unescaped control character) occurs: is it tag contents or >> 'title' attribute for some tag, what tag is it (name and class), in >> what view or views this bug is present, and in which part this occur? >> Without those details it would b much harder to diagnose this bug... > > No problem -- It appears to be in the title attribute, and it appears > straight away when I go to the toplevel view of the repo, assuming that > the commit is within the top 10 recent commits that are shown on the > summary page. I've put more details below on how I can reproduce it > and the page source deltas -- hopefully this will help. If there is > something else I can provide that would help, don't hesitate to ask. Ahh... that is what I thought. The problem that we have to solve to fix this bug is twofold: * CGI.pm does by default slight escaping (simple_escape from CGI::Util) of _attribute_ values, but for obvious reasons it cannot do unconditional escaping of tag _contents_ (because it can be HTML itself). This escaping, at least in CGI.pm version 3.10 (most current version at CPAN is 3.43), is minimal: only '"', '&', '<' and '>' are escaped using named HTML entity references (", &, < and > respectively). simple_escape does not do escaping of control characters such as ^X which are invalid in XHTML (in strict mode). Note that IIRC escaping '<' and '>' in attributes is not strictly necessary. Gitweb relies on the fact that CGI.pm does escaping of attribute values. We cannot escape attributes (e.g. "title" attribute with (almost) full commit subject) as it is now, because it would lead to double escaping. Fortunately it is possible to turn off autoescaping by using $cgi->autoEscape(undef); note however that we would have to do attribute escaping by ourself in the scope of this declaration. * Rules for escaping attribute values are slightly different for rules for escaping HTML. For attribute values we have to escape '"' because it is attribute delimiter, and '&' because it is escape character; escaping '<' and '>' is not strictly necessary. For escaping HTML we need to escape '<' and '>' because they introduce tags, and '&' because it is escape character; escaping '"' is not strictly necessary. It does not make sense to replace spaces by in attribute values, although it shouldn't harm. OTOH we should perhaps escape newlines in attribute values. For esc_html and esc_path we replace (currently) control characters by character escape codes (e.g. "\f" for form-feed, "\0" for NUL, hexadecimal escapes for 'other' control characters). But it is not the only possible solution. We can use Unicode printable representation of control characters instead (0x2400 sheet). Or we can use control key sequence / caret notation e.g. ^X for \0x18, or ^L for "\f" there. We probably should discus this in more detail. So it is not that simple... P.S. The subject (one line summary of this change) should be also changed to for example "gitweb: escape control characters in attributes" and in commit message itself you should explain that control characters break rendering in Firefox in strict XML compliance mode... or something like that. -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html