Search Postgresql Archives

Re: questions on (parallel) COPY and when to REINDEX

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Andy.  Thanks for the responses and information.

Just to let you know... what we are storing in the db are the image
attributes - about 40 of them - not the images.  So the COPY
is reading an ascii file of the image attributes.  It turns out to be
useful to have the image attributes handy - much better than reading
the image headers. The images are available on spinning disk, and
the image locations are in the db.

Thanks,
Janet


On 02/08/2009 05:59 p.m., Andy Colson wrote:
On 1 Aug 2009, at 23:24, Janet Jacobsen wrote:
My questions are:
(2) Should I REINDEX these two tables daily after the pipeline
completes?  Is this what other people do in practice?

No need, but as people are querying as soon as data is arriving, an analyse of the table you just copied to should help performance - the query plans will be derived from more accurate table statistics that way.

The files that are being copied into the table(s) are between 200 to 1,000
each, but there are hundreds of these small files every day.  Would you
recommend running analyze after every COPY?

Just once, at the end. Assuming you are only deleting 100k records and re-adding/updating another 100k in a batch. That's not so many records it'll through the stats out of whack. (If you were dropping/updating 15M in a batch you might wanna analyze in the middle)


You wrote that your pipeline runs for a period of 4 hours and the table is about 15M rows now. What is taking up all that time? I understand why you'd want to parallelise that process, but where do you expect the gains?


We're processing images, and the data processing and analysis takes
up most of the time, but the images can be processed/analyzed in parallel.
We've been doing all of the data loading at the end - one COPY at a
time.  Originally that made sense because the researchers wanted to
check the images before loading the data/analysis results into the db.

Ah! Images! When you are doing the COPY are you escaping the data? You cant "just" copy a binary file.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux