Re: Huge input lookup exception when trying to create the index for XML data type column in postgreSQL

Tom Lane <tgl@xxxxxxxxxxxxx> · Thu, 07 Sep 2023 16:21:43 -0400

Erik Wienhold <ewie@xxxxxxxxx> writes:
> On 07/09/2023 21:09 CEST Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>> There is no such string anywhere in the Postgres source code;
>> furthermore, if someone tried to add such an error, it'd get rejected
>> (I hope) as not conforming to our style guidelines.  I thought maybe
>> it's coming from libxml or the xpath code, but I couldn't get a match
>> for it anywhere in Debian Code Search either.  Is that the *exact*
>> spelling of the message?

> Looks like "Huge input lookup" as reported in [1] (also from Sai) and that
> error is from libxml.

Ah, thanks for the pointer.  It looks like for the DOCUMENT case,
we could maybe relax this restriction by passing the XML_PARSE_HUGE
option to xmlCtxtReadDoc().  However, there are things to worry about:

* Some of the other libxml functions we use don't seem to have an
options argument, so it's not clear how to remove the limit in all
code paths.

* One of the first hits I got while googling for XML_PARSE_HUGE was
CVE-2022-40303 [1] (libxml2: integer overflows with XML_PARSE_HUGE).
It seems highly likely that not everybody's libxml is patched for
that yet, meaning we'd be opening a lot of systems to security issues.

* XML_PARSE_HUGE apparently also removes restrictions on nesting
depth of XML documents.  I wonder whether that creates a risk of
stack-overflow crashes.

On the whole, I'm not sure I want to mess with this.  libxml2 is
rickety enough already without taking off its training wheels.
And, as noted by David J., we'd very possibly only be moving
the bottleneck somewhere else.  "Put many megabytes of data into
one field" is an antipattern for successful SQL use, and probably
always will be.

			regards, tom lane

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2136266