Tom Lane wrote: > Ron Mayer <rm_pg@xxxxxxxxxxxxxxxxxxxxxxx> writes: >> vm=# create index "gist70000" on tmp_intarray_test using GIST (my_int_array gist__int_ops); >> CREATE INDEX >> Time: 2069836.856 ms > >> Is that expected, or does it sound like a bug to take over >> half an hour to index 70000 rows of mostly 5 and 6-element >> integer arrays? > > I poked at this example with oprofile. It's entirely CPU-bound AFAICT, Oleg pointed out to me (off-list I now see) that it's not totally unexpected behavior and I should have been using gist__intbig_ops, since the "big" refers to the cardinality of the entire set (which was large, in my case) and not the length of the arrays. Oleg Bartunov wrote: OB:> it's not about short or long arrays, it's about small or big OB:> cardinality of the whole set (the number of unique elements) I'm re-reading the docs and still wasn't obvious to me. A potential docs patch is attached below. > and the CPU utilization is approximately > > 55% g_int_compress > 35% memmove/memcpy (difficult to distinguish these) > 1% pg_qsort > <1% anything else > > Probably need to look at reducing the number of calls to g_int_compress > ... it must be getting called a whole lot more than once per new index > entry, and I wonder why that should need to be. Perhaps that's a separate issue, but we're working fine with gist__intbig_ops for the time being. Here's a proposed docs patch that makes this more obvious. *** a/doc/src/sgml/intarray.sgml --- b/doc/src/sgml/intarray.sgml *************** *** 239,245 **** <literal>gist__int_ops</> (used by default) is suitable for small and medium-size arrays, while <literal>gist__intbig_ops</> uses a larger signature and is more ! suitable for indexing large arrays. </para> <para> --- 239,247 ---- <literal>gist__int_ops</> (used by default) is suitable for small and medium-size arrays, while <literal>gist__intbig_ops</> uses a larger signature and is more ! suitable for indexing high-cardinality data sets - where there ! are a large number of unique elements across all rows being ! indexed. </para> <para> - Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance