Tom Lane wrote:
Sergey Burladyan <eshkinkot@xxxxxxxxx> writes:
show maintenance_work_mem ;
maintenance_work_mem
----------------------
128MB
create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int);
insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n;
create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) );
[ takes forever ]
Seems the reason this is so awful is that the incoming data is exactly
presorted, meaning that the tree structure that ginInsertEntry() is
trying to build degenerates to a linear list (ie, every new element
becomes the right child of the prior one). So the processing becomes
O(N^2) up till you reach maintenance_work_mem and flush the tree. With
a large setting for maintenance_work_mem it gets spectacularly slow.
Yes, this is probably the same issue I bumped into a while ago:
http://archives.postgresql.org/message-id/49350A13.3020105@xxxxxxxxxxxxxxxx
I think a reasonable solution for this might be to keep an eye on
maxdepth and force a flush if that gets too large (more than a few
hundred, perhaps?). Something like this:
/* If we've maxed out our available memory, dump everything to the index */
+ /* Also dump if the tree seems to be getting too unbalanced */
- if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L)
+ if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L ||
+ buildstate->accum.maxdepth > DEPTH_LIMIT)
{
The new fast-insert code likely will need a similar defense.
I fooled around with a balanced tree, which solved the problem but
unfortunately made the unsorted case slower. Limiting the depth like
that should avoid that so it's worth trying.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance