On 10 April 2015 at 03:27, John R Pierce <pierce@xxxxxxxxxxxx> wrote:
one possible rationale for using BYTEA is that the data could be in various encodings, which the application wishes to preserve, and keeps track of somewhere else (perhaps in a field within the XML?).
Thanks for bringing this up, as it's a good reason to use bytea for XML.
XML actually has an encoding field in the DTD declaration, e.g.
<?xml version="1.0" encoding="UTF-8"?>
It is common - and of dubious correctness - for applications to store XML in a 'text' or 'xml' field without changing the 'encoding' field in the doctype to reflect the encoding at rest.
Personally I wish the 'xml' type in Pg knew how to change the encoding declaration dynamically, but I know it's a hairy problem; e.g. if the client_encoding is iso-8859-1, but the client then converts the XML document to utf-8 internally, the encoding will be wrong if the client doesn't change it back.
I've also run into XML documents that shove data in different encodings into CDATA sections. This is wrong, of course, but apps sometimes do it anyway.