Re: [PATCH] gitweb: parse_commit_text encoding fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 6 Aug 2009, Zoltán Füzesi wrote:
> 2009/8/4 Junio C Hamano <gitster@xxxxxxxxx>:
> >
> > Thanks, Zoltán.
> >
> > We should be able to set up a script that scrapes the output to test this
> > kind of thing.  We may not want to have a test pattern that matches too
> > strictly for the current structure and appearance of the output
> > (e.g. counting nested <div>s, presentation styles and such), but if we can
> > robustly scrape off HTML tags (e.g. "elinks -dump") and check the
> > remaining payload, it might be enough.
> >
> > Jakub what do you think?  I suspect that scraping approach may turn out to
> > be too fragile for tests to be worth doing, but I am just throwing out a
> > thought.
> >
> 
> This issue comes out when chop_and_escape_str function is called with
> a non-ascii string (like my name :)) without before calling to_utf8 on
> it. "author_name" and "committer_name" are two examples, and
> "author_name" shows up with bad encoding in HTML.
> 
> Example from one of my repos (little piece from shortlog output):
> <td class="author"><span title="Füzesi Zoltán">Füzesi Zoltán</span></td>
> After applying the patch:
> <td class="author">Füzesi Zoltán</td>
> 
> This is an "old" (seen in 1.5.6 version too) and (I think) minor issue.
> I haven't spent time on thinking how a test script could show this yet.
> Waiting for Jakub's reaction.

Oh, so the problem is not only to just have correct output (for example
"Füzesi Zoltán" somewhere on HTML page produced by gitweb), but also do
not have incorrect output (for example "Füzesi Zoltán").

I think it would be better to leave t9500-gitweb-standalone-no-errors.sh
to be only about no Perl errors and no Perl warnings.  So I'd rather
have test checking if gitweb handles non US-ASCII in output correctly
in a separate test, e.g. t9501-gitweb-standalone-i18n.sh.  That would
mean extracting gitweb_init() and gitweb_run() (and perhaps also
gitweb_check_prereq() or something) into common file t/lib-gitweb.sh

We would check e.g. if "startáąend" is present in output (correct output),
and whether extracting "start[^ ]*end" produces only "startáąend" (no
incorrect output).


As for gitweb, we should make sure that everything is stored in Perl
variables and Perl structures _after_ treating with to_utf8().  This
would require some cleanup of the code, and having such test would
help to check if we didn't introduce any regressions.

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]