Re: very very slow inserts into very large table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/17/2012 01:56 AM, Jon Nelson wrote:
What is the greater lesson to take away, here? If you are working with data that is larger (substantially larger) than available memory, is the architecture and design of postgresql such that the only real approach is some type of data partitioning? It is not my intent to insult or even disparage my favorite software, but it took less time to *build* the indices for 550GB of data than it would have to insert 1/20th as much. That doesn't seem right.

To perform reasonably well, Pg would need to be able to defer index updates when bulk-loading data in a single statement (or even transaction), then apply them when the statement finished or transaction committed. Doing this at a transaction level would mean you'd need a way to mark indexes as 'lazily updated' and have Pg avoid using them once they'd been dirtied within a transaction. No such support currently exists, and it'd be non-trivial to implement, especially since people loading huge amounts of data often want to do it with multiple concurrent sessions. You'd need some kind of 'DISABLE INDEX' and 'ENABLE INDEX' commands plus a transactional backing table of pending index updates.

Not simple.


Right now, Pg is trying to keep the index consistent the whole time. That involves moving a heck of a lot of data around - repeatedly.

Setting a lower FILLFACTOR on your indexes can give Pg some breathing room here, but only a limited amount, and at the cost of reduced scan efficiency.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux