Search Postgresql Archives

Insertion of large xml files into PostgreSQL 10beta1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I am building up a PostgreSQL server which I intend to load the
entirety of the pubmed database data (23GB bzip2 compressed, 216GB
unpacked) which is available in xml form of which, here is an example:

https://www.ncbi.nlm.nih.gov/pubmed/21833294?report=xml&format=text

I looked at the documentation as well as several examples code for
loading the data and the one example who nearly succeeded is this
procedure:

/usr/bin/psql medline

\set :largexmlfile: 'cat /srv/pgsql/pubmed/medline17n0001.xml'

INSERT INTO samples (xmldata) VALUES :largexmlfile:

(from reading the list post here:
https://www.postgresql.org/message-id/20160624083757.GA5459%40msg.df7cb.de)

In which, about 334MB of data from medline17n0001.xml will flood the
monitor. I do notice some error code values flooding at some point
during the load and then it end up throwing an error because it
interpret some accented comment in the pubmed files (abstract data in
a non-us language most likely).

I will work out a way to script these commands into a bash script[1]
when I get back home (I'm at work at the moment, returning home at 5pm
EST) with two log files (stdout & stderr) but I would like to know if
it is possible to turn off validation of the content between the xml
tags of the files.

[1] == there is close to 1200 medline files averaging 350MB each.

platform is Linux From Scratch subversion tracked release running out
of a 1.5GB ramdrive plus dhcpcd, Python 2.7.13, libxml2-2.9.4,
libxslt-1.1.29 and PostgreSQL 10 beta 1 with datafiles being
/srv/pgsql/data on a single partition 931.5GB western digital drive
dedicated to PostgreSQL for the moment.

The goal is to build a server but PostgreSQL is running in the
ramdrive at the moment for partition sizing and calculation purpose.

The server is a core2 based old machine, 4GB of ram and the
aforementioned 931.5GB hard disk. It also has an nvidia card intended
for use on text mining application.

Thanks you very much.

Alain


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux