Jakub Narębski <jnareb@xxxxxxxxx> writes: > On Thu, May 15, 2014 at 9:38 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> Jakub Narębski <jnareb@xxxxxxxxx> writes: >> >>> Writing test for this would not be easy, and require some HTML >>> parser (WWW::Mechanize, Web::Scraper, HTML::Query, pQuery, >>> ... or low level HTML::TreeBuilder, or other low level parser). >> >> Hmph. Is it more than just looking for a specific run of %xx we >> would expect to see in the output of the tree view for a repository >> in which there is one tree with non-ASCII name? > > There is if we want to check (in non-fragile way) that said > specific run is in 'href' *attribute* of 'a' element (link target). Correct, but is "where does it appear" the question we are primarily interested in, wrt this breakage and its fix? If gitweb output has some volatile parts that do not depend on the contents of the Git test repository (e.g. showing contents of /etc/motd, date/time of when the test was run, or the full pathname leading to the trash directory), then preparing a tree whose name is äéìõû and making sure that the properly encoded version of äéìõû appears anywhere in the output may not be sufficient to validate that we got the encoding right, as that string may appear in the parts that are totally unrelated to the contents being shown and not under our control. But is that really the case? Also we may introduce a bug and misspell the attr name and produce an anchor element with hpef attribute with the properly encoded URL in it, and your "parse HTML properly" approach would catch it, but is that the kind of breakage under discussion? You hinted at new tests for UTF-8 encoding in the other message in the thread earlier, and I've been assuming that we were talking about the encoding test, not a test to catch s/href/hpef/ kind of breakage. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html