Re: Inserts in 'big' table slowing down the database

Jeff Janes <jeff.janes@xxxxxxxxx> · Wed, 3 Oct 2012 12:35:42 -0700

On Mon, Oct 1, 2012 at 5:15 PM, Stefan Keller <sfkeller@xxxxxxxxx> wrote:
> Sorry for the delay. I had to sort out the problem (among other things).
>
> It's mainly about swapping.

Do you mean ordinary file IO?  Or swapping of an actual process's
virtual memory?  The latter shouldn't happen much unless you have
something mis-configured.

>
> The table nodes contains about 2^31 entries and occupies about 80GB on
> disk space plus index.

How big is each index?

If you reset the stats just before the bulk load, what do select *
from pg_statio_user_tables and select * from pg_statio_user_indexes
show after the bulk load?  What does vmstat show during the load?

> If one would store the geom values in a big array (where id is the
> array index) it would only make up about 16GB, which means that the
> ids are dense (with few deletes).
> Then updates come in every hour as bulk insert statements with entries
> having ids in sorted manner.

Is the problem that these operations themselves are too slow, or that
they slow down other operations when they are active?  If the main
problem is that it slows down other operations, what are they?

If the problem is the length of the bulk operations themselves, what
happens if you split them up into chunks and run them in parallel?

Do you have a test/dev/QA server?  How long does a bulk insert take
under the four conditions of both indexes (PK and geometry), neither
index, just one, or just the other?

> Now PG becomes slower and slower!
> CLUSTER could help - but obviously this operation needs a table lock.
> And if this operation takes longer than an hour, it delays the next
> update.

I don't see why a CLUSTER would help.  Your table is probably already
clustered well on the serial column.  Clustering it instead on the
geometry probably wouldn't accomplish much.  One thing that might help
would be to stuff the data to be inserted into a scratch table, index
that on the geometry, cluster that scratch table, and then do the
insert to the main table from the scratch table.   That might result
in the geom being inserted in a more cache-friendly order.

> Any ideas? Partitioning?

Do most selects against this table specify user_name as well as a
geometry query?  If so, that might be a good partitioning key.
Otherwise, I don't see what you could partition on in a way that make
things better.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance