Re: very very slow inserts into very large table

Craig Ringer <ringerc@xxxxxxxxxxxxx> · Tue, 17 Jul 2012 11:30:56 +0800

On 07/17/2012 01:56 AM, Jon Nelson wrote:
What is the greater lesson to take away, here? If you are working with 
data that is larger (substantially larger) than available memory, is 
the architecture and design of postgresql such that the only real 
approach is some type of data partitioning? It is not my intent to 
insult or even disparage my favorite software, but it took less time 
to *build* the indices for 550GB of data than it would have to insert 
1/20th as much. That doesn't seem right. 

To perform reasonably well, Pg would need to be able to defer index 
updates when bulk-loading data in a single statement (or even 
transaction), then apply them when the statement finished or transaction 
committed. Doing this at a transaction level would mean you'd need a way 
to mark indexes as 'lazily updated' and have Pg avoid using them once 
they'd been dirtied within a transaction. No such support currently 
exists, and it'd be non-trivial to implement, especially since people 
loading huge amounts of data often want to do it with multiple 
concurrent sessions. You'd need some kind of 'DISABLE INDEX' and 'ENABLE 
INDEX' commands plus a transactional backing table of pending index updates.

Not simple.

Right now, Pg is trying to keep the index consistent the whole time. 
That involves moving a heck of a lot of data around - repeatedly.

Setting a lower FILLFACTOR on your indexes can give Pg some breathing 
room here, but only a limited amount, and at the cost of reduced scan 
efficiency.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance