Re: Huge input lookup exception when trying to create the index for XML data type column in postgreSQL

Sai Teja <saitejasaichintalapudi@xxxxxxxxx> · Fri, 8 Sep 2023 11:21:13 +0530

Thank you so much for all your responses.
I just tried with Hash, GIN etc

But it didn't worked. And I think it is because of "Xpath" _expression_ which I used in the index create command.

But is there any alternative way to change this Xpath? Since I need to parse the XML as there is no other option. I need the other ways to create the index .

May be if there are any parameters to change like xmloption etc it would help us to resolve the issue.

Thanks,
Sai

On Fri, 8 Sep, 2023, 1:51 am Tom Lane, <tgl@xxxxxxxxxxxxx> wrote:
Erik Wienhold <ewie@xxxxxxxxx> writes:

> On 07/09/2023 21:09 CEST Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

>> There is no such string anywhere in the Postgres source code;

>> furthermore, if someone tried to add such an error, it'd get rejected

>> (I hope) as not conforming to our style guidelines.  I thought maybe

>> it's coming from libxml or the xpath code, but I couldn't get a match

>> for it anywhere in Debian Code Search either.  Is that the *exact*

>> spelling of the message?

> Looks like "Huge input lookup" as reported in [1] (also from Sai) and that

> error is from libxml.

Ah, thanks for the pointer.  It looks like for the DOCUMENT case,

we could maybe relax this restriction by passing the XML_PARSE_HUGE

option to xmlCtxtReadDoc().  However, there are things to worry about:

* Some of the other libxml functions we use don't seem to have an

options argument, so it's not clear how to remove the limit in all

code paths.

* One of the first hits I got while googling for XML_PARSE_HUGE was

CVE-2022-40303 [1] (libxml2: integer overflows with XML_PARSE_HUGE).

It seems highly likely that not everybody's libxml is patched for

that yet, meaning we'd be opening a lot of systems to security issues.

* XML_PARSE_HUGE apparently also removes restrictions on nesting

depth of XML documents.  I wonder whether that creates a risk of

stack-overflow crashes.

On the whole, I'm not sure I want to mess with this.  libxml2 is

rickety enough already without taking off its training wheels.

And, as noted by David J., we'd very possibly only be moving

the bottleneck somewhere else.  "Put many megabytes of data into

one field" is an antipattern for successful SQL use, and probably

always will be.

                        regards, tom lane

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2136266