Search Postgresql Archives

Using XML_PARSE_HUGE in operations on xml fields?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I ran into trouble with an xpath expression on a large XML file:

SELECT id, xpath('//tei:div/descendant::tei:head/text()', x, ARRAY[ARRAY['tei', 'http://www.tei-c.org/ns/1.0']]) AS stuff FROM test WHERE id=1;

returns:

ERROR:  could not parse XML document
DETAIL:  line 491482: internal error: Huge input lookup
						स्थापितः।। </p>
						                               ^
line 491482: Extra content at the end of the document
						स्थापितः।। </p>
						                               ^

I think this is an error that comes from the libxml2 library, since
pretty much the same thing happens with xmllint if you pass the
`--memory' option.

This is the  libxml2 documentation which is relevant here, I think:

"""
#define XML_MAX_DICTIONARY_LIMIT

Maximum size allowed by the parser for a dictionary by default This is
not a limitation of the parser but a safety boundary feature, use
XML_PARSE_HUGE option to override it. Introduced in 2.9.0
"""

(see http://xmlsoft.org/html/libxml-parserInternals.html#XML_MAX_LOOKUP_LIMIT)

So I was wondering if and how I could set that XML_PARSE_HUGE option
in postgresql? I couldn't find anything in the docs or in this list's
archives.

If you want to replicate the problem quickly, with one approx 37MB xml
file and one smaller one:

createdb xmlpost
psql xmlpost

CREATE TABLE test (id integer, x xml);
\set content `curl http://sarit.indology.info/downloads/mahabharata-devanagari.xml`
INSERT INTO test (SELECT 1, (SELECT XMLPARSE (DOCUMENT :'content')));
\set content `curl http://sarit.indology.info/downloads/ratnakIrti-nibandhAvali.xml`
INSERT INTO test (SELECT 2, (SELECT XMLPARSE (DOCUMENT :'content')));

SELECT id, xpath('count(//tei:div/descendant::tei:head)', x, ARRAY[ARRAY['tei', 'http://www.tei-c.org/ns/1.0']]) AS stuff FROM test WHERE id=1; -- fails

SELECT id, xpath('count(//tei:div/descendant::tei:head)', x, ARRAY[ARRAY['tei', 'http://www.tei-c.org/ns/1.0']]) AS stuff FROM test WHERE id=2; -- works: 13

Thanks for any hints,

-- 
patrick


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux