Re: xPath in a database with LATIN1 encoding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Am 24.04.21 um 20:20 schrieb Jorge Silva:
Thanks for the quick reply, Tom. I am trying something simpler. I am trying to find a way to run the xPath function with a xml file type, which has latin characters, such as:
SELECT xpath(‘//xml_test/text()’, convert_from(convert_to(‘<xml_test>çã</xml_test>','utf-8’),'utf-8')::xml)

This line returns the same error, as follows:
[Code: 0, SQL State: 2200M]  ERROR: could not parse XML document
  Detail: line 1: Input is not proper UTF-8, indicate encoding !
Bytes: 0xE7 0xE3 0x3C 0x2F
<xml_test>çã</xml_test>
          ^
This happens because the “text”, which is the output from convert_from() function is encoded with LATIN1, the database encoding set, and not UTF-8.  Is there a way that a text variable is not encoded as the database encoding set, but some other encoding set, such as UTF-8? From what I’ve searched for, it seems that it would be something similar to nvarchar that only exists in Microsoft SQL Server.



On 24 Apr 2021, at 13:49, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

Jorge Silva <jorge.silva93@xxxxxxxxx> writes:
The characters that it is not recognizing are both “ç” and “ã” because I think they are encoded differently in latin1 and utf-8. Is it possible to somehow use the xPath function with special characters in the XML and in a database which is not encoded with utf-8?

I don't have a lot of expertise in this area, but I think you need
an explicit encoding indicator in the xml header, a la

   <?xml encoding="latin1"?> ...

On the whole, the xml type is definitely easier to use with database
encoding set to utf8.  I think you'll be paying for encoding conversion
every time we interact with libxml, for instance.

regards, tom lane

The point is that the XML standard dictates that every document without an explicit encoding, must be encodied in UTF-8. So, whenever you want to use a different encoding, this has to be stated in the XML header, as indicated in Tom Lane's answer.

So the encoding of XML columns isn't actually the one of the database (although UTF-8 is best in any case), but simply is the XML standard. Your client encoding may vary, though, but this would limit the characters which can be transferred.

Regards,

Holger


-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux