Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 15 May 2014 18:26:35 -0700

Jakub Narębski <jnareb@xxxxxxxxx> writes:

> On Thu, May 15, 2014 at 9:38 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> Jakub Narębski <jnareb@xxxxxxxxx> writes:
>>
>>> Writing test for this would not be easy, and require some HTML
>>> parser (WWW::Mechanize, Web::Scraper, HTML::Query, pQuery,
>>> ... or low level HTML::TreeBuilder, or other low level parser).
>>
>> Hmph.  Is it more than just looking for a specific run of %xx we
>> would expect to see in the output of the tree view for a repository
>> in which there is one tree with non-ASCII name?
>
> There is if we want to check (in non-fragile way) that said
> specific run is in 'href' *attribute* of 'a' element (link target).

Correct, but is "where does it appear" the question we are
primarily interested in, wrt this breakage and its fix?

If gitweb output has some volatile parts that do not depend on the
contents of the Git test repository (e.g. showing contents of
/etc/motd, date/time of when the test was run, or the full pathname
leading to the trash directory), then preparing a tree whose name is
äéìõû and making sure that the properly encoded version of äéìõû
appears anywhere in the output may not be sufficient to validate
that we got the encoding right, as that string may appear in the
parts that are totally unrelated to the contents being shown and not
under our control.  But is that really the case?

Also we may introduce a bug and misspell the attr name and produce
an anchor element with hpef attribute with the properly encoded URL
in it, and your "parse HTML properly" approach would catch it, but
is that the kind of breakage under discussion?  You hinted at new
tests for UTF-8 encoding in the other message in the thread earlier,
and I've been assuming that we were talking about the encoding test,
not a test to catch s/href/hpef/ kind of breakage.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html