On Thu, 6 Aug 2009, Zoltán Füzesi wrote: > 2009/8/4 Junio C Hamano <gitster@xxxxxxxxx>: > > > > Thanks, Zoltán. > > > > We should be able to set up a script that scrapes the output to test this > > kind of thing. We may not want to have a test pattern that matches too > > strictly for the current structure and appearance of the output > > (e.g. counting nested <div>s, presentation styles and such), but if we can > > robustly scrape off HTML tags (e.g. "elinks -dump") and check the > > remaining payload, it might be enough. > > > > Jakub what do you think? I suspect that scraping approach may turn out to > > be too fragile for tests to be worth doing, but I am just throwing out a > > thought. > > > > This issue comes out when chop_and_escape_str function is called with > a non-ascii string (like my name :)) without before calling to_utf8 on > it. "author_name" and "committer_name" are two examples, and > "author_name" shows up with bad encoding in HTML. > > Example from one of my repos (little piece from shortlog output): > <td class="author"><span title="Füzesi Zoltán">Füzesi Zoltán</span></td> > After applying the patch: > <td class="author">Füzesi Zoltán</td> > > This is an "old" (seen in 1.5.6 version too) and (I think) minor issue. > I haven't spent time on thinking how a test script could show this yet. > Waiting for Jakub's reaction. Oh, so the problem is not only to just have correct output (for example "Füzesi Zoltán" somewhere on HTML page produced by gitweb), but also do not have incorrect output (for example "Füzesi Zoltán"). I think it would be better to leave t9500-gitweb-standalone-no-errors.sh to be only about no Perl errors and no Perl warnings. So I'd rather have test checking if gitweb handles non US-ASCII in output correctly in a separate test, e.g. t9501-gitweb-standalone-i18n.sh. That would mean extracting gitweb_init() and gitweb_run() (and perhaps also gitweb_check_prereq() or something) into common file t/lib-gitweb.sh We would check e.g. if "startáąend" is present in output (correct output), and whether extracting "start[^ ]*end" produces only "startáąend" (no incorrect output). As for gitweb, we should make sure that everything is stored in Perl variables and Perl structures _after_ treating with to_utf8(). This would require some cleanup of the code, and having such test would help to check if we didn't introduce any regressions. -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html