Search Postgresql Archives

Re: Using COPY to import large xml file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





2018-06-25 17:30 GMT+02:00 Anto Aravinth <anto.aravinth.cse@xxxxxxxxx>:


On Mon, Jun 25, 2018 at 8:54 PM, Anto Aravinth <anto.aravinth.cse@xxxxxxxxx> wrote:


On Mon, Jun 25, 2018 at 8:20 PM, Nicolas Paris <niparisco@xxxxxxxxx> wrote:

2018-06-25 16:25 GMT+02:00 Anto Aravinth <anto.aravinth.cse@xxxxxxxxx>:
Thanks a lot. But I do got lot of challenges! Looks like SO data contains lot of tabs within itself.. So tabs delimiter didn't work for me. I thought I can give a special demiliter but looks like Postrgesql copy allow only one character as delimiter :(

Sad, I guess only way is to insert or do a through serialization of my data into something that COPY can understand.

​easiest way would be:
xml -> csv -> \copy

​by csv, I mean regular quoted csv (Simply wrap csv field with double quote, and escape
enventually contained quotes with an other double quote.).

I tried but no luck. Here is the sample csv, I wrote from my xml convertor:

1       "Are questions about animations or comics inspired by Japanese culture or styles considered on-topic?"  "pExamples include a href="" href="http://www.imdb.com/title/tt0417299/" target="_blank">http://www.imdb.com/title/tt0417299/"" rel=""nofollow""Avatar/a, a href="" href="http://www.imdb.com/title/tt1695360/" target="_blank">http://www.imdb.com/title/tt1695360/"" rel=""nofollow""Korra/a and, to some extent, a href="" href="http://www.imdb.com/title/tt0278238/" target="_blank">http://www.imdb.com/title/tt0278238/"" rel=""nofollow""Samurai Jack/a. They're all widely popular American cartoons, sometimes even referred to as ema href="" href="https://en.wikipedia.org/wiki/Anime-influenced_animation" target="_blank">https://en.wikipedia.org/wiki/Anime-influenced_animation"" rel=""nofollow""Amerime/a/em./p


pAre questions about these series on-topic?/p

"       "pExamples include a href="" href="http://www.imdb.com/title/tt0417299/" target="_blank">http://www.imdb.com/title/tt0417299/"" rel=""nofollow""Avatar/a, a href="" href="http://www.imdb.com/title/tt1695360/" target="_blank">http://www.imdb.com/title/tt1695360/"" rel=""nofollow""Korra/a and, to some extent, a href="" href="http://www.imdb.com/title/tt0278238/" target="_blank">http://www.imdb.com/title/tt0278238/"" rel=""nofollow""Samurai Jack/a. They're all widely popular American cartoons, sometimes even referred to as ema href="" href="https://en.wikipedia.org/wiki/Anime-influenced_animation" target="_blank">https://en.wikipedia.org/wiki/Anime-influenced_animation"" rel=""nofollow""Amerime/a/em./p


pAre questions about these series on-topic?/p

"       "null"


the schema of my table is:

  CREATE TABLE so2 (
    id  INTEGER NOT NULL PRIMARY KEY,
    title varchar(1000) NULL,
    posts text,
    body TSVECTOR,
    parent_id INTEGER NULL,
    FOREIGN KEY (parent_id) REFERENCES so1(id)
);

and when I run:

COPY so2 from '/Users/user/programs/js/node-mbox/file.csv';


I get:


ERROR:  missing data for column "body"

CONTEXT:  COPY so2, line 1: "1 "Are questions about animations or comics inspired by Japanese culture or styles considered on-top..."

 

CONTEXT:  COPY so2, line 1: "1 "Are questions about animations or comics inspired by Japanese culture or styles considered on-top..."


Not sure what I'm missing. Not sure the above csv is breaking because I have newlines within my content. But the error message is very hard to debug. 




​What you are missing is the configuration of COPY statement​ (please refer to https://www.postgresql.org/docs/9.2/static/sql-copy.html)
such format, delimiter, quote and escape

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux