Hi, Andy. Thanks for the responses and information.
Just to let you know... what we are storing in the db are the image
attributes - about 40 of them - not the images. So the COPY
is reading an ascii file of the image attributes. It turns out to be
useful to have the image attributes handy - much better than reading
the image headers. The images are available on spinning disk, and
the image locations are in the db.
Thanks,
Janet
On 02/08/2009 05:59 p.m., Andy Colson wrote:
On 1 Aug 2009, at 23:24, Janet Jacobsen wrote:
My questions are:
(2) Should I REINDEX these two tables daily after the pipeline
completes? Is this what other people do in practice?
No need, but as people are querying as soon as data is arriving, an
analyse of the table you just copied to should help performance -
the query plans will be derived from more accurate table statistics
that way.
The files that are being copied into the table(s) are between 200 to
1,000
each, but there are hundreds of these small files every day. Would you
recommend running analyze after every COPY?
Just once, at the end. Assuming you are only deleting 100k records
and re-adding/updating another 100k in a batch. That's not so many
records it'll through the stats out of whack. (If you were
dropping/updating 15M in a batch you might wanna analyze in the middle)
You wrote that your pipeline runs for a period of 4 hours and the
table is about 15M rows now. What is taking up all that time? I
understand why you'd want to parallelise that process, but where do
you expect the gains?
We're processing images, and the data processing and analysis takes
up most of the time, but the images can be processed/analyzed in
parallel.
We've been doing all of the data loading at the end - one COPY at a
time. Originally that made sense because the researchers wanted to
check the images before loading the data/analysis results into the db.
Ah! Images! When you are doing the COPY are you escaping the data?
You cant "just" copy a binary file.
-Andy
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general