Search Postgresql Archives

Re: Importing Many XML Records

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> I'm sure that this has been asked before but I can't find any 
> reference to it in google, and the search facility on 
> postgresql.org is currently down.

http://groups.google.com/groups?q=group%3Apgsql.*

provides the same with a slight delay but arguably a better user
interface.

> I have a large number of entries (possibly 10,000+) in an XML 
> file that I need to import into the database (7.4 on Debian) 
> on a daily basis. Does anyone have any recommendations 
> concerning the best way to do this? Is there some tool I 
> should use or should I create the code in java to parse and 
> import the data?
> 
> If anyone has done this before, I would appreciate hearing 
> how they did this.

This is generally outside the scope of this list. I am guessing (since I
don't know much about your data format or goals), but you probably want
to first transform the XML into a format suitable for importation into
the database using COPY, or (much less desirable) a bunch of insert
statements. In either case you should become familiar with XSLT
processing and write yourself an XSLT template to do the job.

I deal with a similar task using Saxon and TagSoup (which I highly
recommend for XML that is not well-formatted) and create a CSV file out
of a multitude of XML files (or a single XML file), which can then be
COPY-ed into a PG table. Instead of a CSV file one could create a SQL
script file of INSERT statements. I recommend Jeni Tennison's "Beginning
XSLT" book as an excellent reference on the subject of XSLT. 

Depending on what your XML looks like you may get away without XSLT at
all, but just preprocess it with awk, sed, perl (Template::Extract is a
useful module) or whatever strikes your fancy.

Other questions to answer are do you want the "records" to stay as XML
in the database or do you want to import them into a regular table
format? If the former you may want to get familiar with the pgxml (aka
xml2 module) so you can query the XML data once inside your database.

George





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux