I have an xml document from which I would like to extract the contents of several elements.
I would like to use xpath to extract the contents of "name" from the xml document shown below.
WITH x AS
(
SELECT
'<?xml version="1.0" encoding="UTF-8"?>
<uniprot xmlns="http://uniprot.org/uniprot" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://uniprot.org/uniprot http://www.uniprot.org/support/docs/uniprot.xsd">
<entry dataset="Swiss-Prot" created="2009-12-15" modified="2016-05-11" version="56">
<accession>A0JM59</accession>
<name>UBP20_XENTR</name>
</entry>
</uniprot>
'::xml AS d
)
SELECT (xpath('/uniprot/entry/name/text()',a.d))[1]::text AS uniprot_name
FROM
x AS a
;
The documentation for xpath() ("https://www.postgresql.org/docs/9.5/static/functions-xml.html") describes "xpath(xpath, xml [, nsarray]").
For the above xml document, what would be the two dimensional array "nsarray" for the xpath() function?
Is there a specific part of the description and two examples that doesn't make sense to you?
Or more specifically, do you understand what namespaces are?
ARRAY[
ARRAY['defaultns','http://uniprot.org/uniprot'],
ARRAY['xsi','http://www.w3.org/2001/XMLSchema-instance']
]
In effect when the xpath function parses the XML document it tosses away all of the document-local namespace aliases and instead associated the full namespace URI with each element (in the DOM). Since, in the xpath _expression_, usually you'd want to refer to nodes in the DOM via their namespace alias you need to tell the xpath function which aliases you intend to use in the xpath and which full URI they correspond to. Furthermore, there is not concept of a default namespace in the xpath _expression_. So while you can simply copy-paste the aliases and URIs from all of the non-default namespace aliases you must also choose a unique alias for the default namespace in the original document.
In the above I've copied the alias and namespace URI for the named "xsi" alias and gave the name "defaultns" to the original document's default namespace URI.
David J.