Re: XML parsing in shell script

Paul Heinlein <heinlein@xxxxxxxxxx> · Thu, 18 Mar 2021 13:30:48 -0700 (PDT)

On Thu, 18 Mar 2021, H wrote:

I have a challenge I am interested in getting feedback on.

I will on a regular basis download a series of data files from the 
web where the data is in XML-format. The format is known in advance 
but is different between the various data files. I then plan to 
extract the various data items ("elements?") from each data file, do 
some light formatting and then save desired parts of each original 
data file as a formatted CSV-file for later importing into a 
database.

As the plan is to use a bash shell script using curl to get the 
files, I have begun looking at external XML parsers that I can call 
from my script, perhaps specify which elements I want, get the data 
back in some kind of bash data structure and finally format and save 
as CSV-files.

There seems to be a number of XML parsers available but perhaps 
someone on the list has a recommendation for which one might suit my 
needs best? I should add that I am running CentOS 7.

Will you be using an XSLT stylesheet to do the work? There's a 
somewhat steep learning curve, but in my experience it's the most 
reliable method for parsing XML except in the very simplest of cases.

In that case, the libxslt stuff may be what you want:

  http://xmlsoft.org/libxslt/

The command-line tool is xsltproc.

Again, it's not easy to use, but once you've built a toolchain, it 
will be reliable and fairly easy to modify if the source XML schema 
change.

--
Paul Heinlein
heinlein@xxxxxxxxxx
45.38° N, 122.59° W
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos