Hi, Nick. I'm afraid we're still having some issue with this. Currently our conf is: ProxyPreserveHost on ProxyHTMLEnable on ProxyHTMLExtended on And our pages are showing fine, but non-english characters fed into <input type="text"> form fields ared posted incorrectly (badly encoded) to our backend server. This won't happen with ProxyHTMLCharsetOut set to "*" or explicitly to "ISO-8859-1"; but that configuration, you know, takes us to the starting point. Without ProxyHTMLCharsetOut, proxy_html is translating our backend ISO-8859-1 response into UTF-8, which is fine. When submitting a form, I guess the browser will also encode its contents in UTF-8, but maybe proxy_html won't reverse-translate that into ISO-8859-1 before relaying it to the backend server. This can be enforced by adding an accept-charset="ISO-8859-1" attribute to the <form> tag (tested on Firefox 77.0b5), so: should proxy_html add that attribute to <form> tags automagically when parsing and translating HTML content? Just speculating, I really don't know the internals of it. But I guess you do :) Thanks in advance. Best regards, Antonio ----- Mensaje original ----- De: "Nick Kew" <niq@xxxxxxxxxx> Para: "users" <users@xxxxxxxxxxxxxxxx> Enviados: Viernes, 8 de Mayo 2020 9:22:40 Asunto: Re: proxy_html / xml2enc won't handle certain HTML entities > On 8 May 2020, at 07:28, Antonio Suárez Pozuelo <a.suarez@xxxxxxxxxxxxxxx> wrote: > > Hi Nick, > > Your glass of wine was inspiring: just removed > >> ProxyHTMLCharsetOut * # Backend (Tomcat) charset is ISO-8859-1 > > and the problem's gone! OK, thanks for confirming it. I'm pretty sure now what's happening. Libxml2 uses unicode (utf-8) internally, so for i18n to work, your iso-8859-1 gets converted before feeding to the parser. But HTML entities are not preserved: they get converted to their unicode representations. ProxyHTMLCharsetOut is kind-of an afterthought: it converts unicode to your choice of encoding. But it doesn't deal with HTML entities. So when it encounters unicode sequences for your "→" et al, it just tries to convert unicode to latin-1, and fails when there is no latin-1 representation. As far as I know this doesn't really matter: unicode support is pretty-near universal, so just leaving it in place has no real downside. I'll think about whether there's an easy fix to ProxyHTMLCharsetOut for cases like this, but will more likely just add a note to the docs about the limitation. > FYI, by increasing LogLevel to INFO, error log shows: Basically just shows the problem isn't your backend. My first reply was leading to "if the debug info doesn't tell us what's wrong, I'll ask for a test case to try and replicate the problem". No need for that now! Thanks for the report! -- Nick Kew --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx