Re: [PATCH] gitweb: parse_commit_text encoding fix

Jakub Narebski <jnareb@xxxxxxxxx> · Fri, 7 Aug 2009 02:41:05 +0200

On Tue, 4 Aug 2009, Junio C Hamano wrote:
> Zoltán Füzesi <zfuzesi@xxxxxxxxx> writes:
> 
> > Call to_utf8 when parsing author and committer names, otherwise they will appear
> > with bad encoding if they written by using chop_and_escape_str.
> >
> > Signed-off-by: Zoltán Füzesi <zfuzesi@xxxxxxxxx>
> > ---
> 
> Thanks, Zoltán.
> 
> We should be able to set up a script that scrapes the output to test this
> kind of thing.  We may not want to have a test pattern that matches too
> strictly for the current structure and appearance of the output
> (e.g. counting nested <div>s, presentation styles and such), but if we can
> robustly scrape off HTML tags (e.g. "elinks -dump") and check the
> remaining payload, it might be enough.
> 
> Jakub what do you think?  I suspect that scraping approach may turn out to
> be too fragile for tests to be worth doing, but I am just throwing out a
> thought.

First, I'd like to have existing t9500-gitweb-standalone-no-errors.sh
be about Perl errors and warning only, as it is now.  Anything outside
this should IMVHO be put in separate test.

Second, for checking whether gitweb handles non US-ASCII input correctly
we don't need HTML scrapping or parsing.  We can simply check if we have
correct string in output... and (after Zoltán Füzesi example) that we
don't have incorrect one.  For example if we have 'xxxóxxx' in input,
then there is 'xxxóxxx' in output, and that all match againts 'xxx.xxx'
matches 'xxxóxxx'.

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html