Re: proxy_html / xml2enc won't handle certain HTML entities

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 8 May 2020, at 07:28, Antonio Suárez Pozuelo <a.suarez@xxxxxxxxxxxxxxx> wrote:
> 
> Hi Nick,
> 
> Your glass of wine was inspiring: just removed
> 
>>       ProxyHTMLCharsetOut     *   # Backend (Tomcat) charset is ISO-8859-1
> 
> and the problem's gone!

OK, thanks for confirming it.  I'm pretty sure now what's happening.

Libxml2 uses unicode (utf-8) internally, so for i18n to work, your iso-8859-1
gets converted before feeding to the parser.  But HTML entities are not
preserved: they get converted to their unicode representations.

ProxyHTMLCharsetOut is kind-of an afterthought: it converts unicode to
your choice of encoding.  But it doesn't deal with HTML entities.  So when
it encounters unicode sequences for your "&rarr;" et al, it just tries to
convert unicode to latin-1, and fails when there is no latin-1 representation.

As far as I know this doesn't really matter: unicode support is pretty-near
universal, so just leaving it in place has no real downside.  I'll think about
whether there's an easy fix to ProxyHTMLCharsetOut for cases like this,
but will more likely just add a note to the docs about the limitation.

> FYI, by increasing LogLevel to INFO, error log shows:

Basically just shows the problem isn't your backend.  My first reply was
leading to "if the debug info doesn't tell us what's wrong, I'll ask for a
test case to try and replicate the problem".  No need for that now!

Thanks for the report!

-- 
Nick Kew
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx





[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux