Hi Tom, Thanks for the hint! I'm actually doing a GROUP BY with an inquality search (HAVING...) and still get long lasting queries, see query (2) below. I'm doing an equality search with success with the '->' operator on the same field 'tags' like in this query (1): -- Count all restaurants in database ("amenity = restaurant''): select count(*) from osm_all_v where hstore(tags)->'amenity'='restaurant' This query 1 is reasonably fast and could be accelerated using this functional index: CREATE INDEX planet_osm_point_tags_restaurant ON planet_osm_point USING btree (tags) WHERE (tags -> 'amenity'::text) = 'restaurant'::text; But look at query 2: This GROUP BY query gives following results (after a VACUUM FULL ANALYZE): * 26265ms without any index on field 'tags' * 26000ms with GIN index on field 'tags' * 26078ms with HASH index on field 'tags' To sum up: * GIN and HASH indexes don't make a difference. * BTree index fails (as reported) on field 'tags' with ERROR 'index row size 3120 exceeds maximum 2712' SQL state: 54000). * GIST index fails too on field 'tags' but with ERROR 'invalid hstore value found', SQL state: XX000. Q: => The latter indicates an "invalid value" - but does not say what's invalid. => I have no clue how to boost query 2...? See below the query and the explain output. Yours, Stefan QUERY 2 ======= -- Return all key-values of semantic type 'enum' without types numeric, date/time etc. -- (actually hstore handles all tag/values as of type text): SELECT MIN(keys.key), hstore(p.tags)->keys.key, count(*) FROM planet_osm_point p, ( SELECT key, count(*) FROM ( SELECT (each(tags)).key FROM planet_osm_point ) AS stat GROUP BY key HAVING count(*) > 1 AND key NOT LIKE 'note:%' ... AND key NOT IN ('ele','ref','url','website','email','maxspeed','converted_by', ... ) ) keys WHERE hstore(p.tags)->keys.key >'' GROUP BY hstore(p.tags)->keys.key HAVING count(*) > 1 ORDER by 1,3 desc ... "amenity";"bus_stop";24414 "amenity";"restaurant";5423 "amenity";"bench";5041 "amenity";"parking";4232 "amenity";"fire_hydrant";2363 "amenity";"post_box";1838 "amenity";"fuel";1628 "amenity";"place_of_worship";1615 ... EXPLAIN output: "Sort (cost=6043108.58..6054613.93 rows=4602142 width=118) (actual time=331415.987..331417.914 rows=964 loops=1)" " Sort Key: (min(keys.key)), (count(*))" " Sort Method: quicksort Memory: 85kB" " -> GroupAggregate (cost=4173505.61..4401220.42 rows=4602142 width=118) (actual time=330101.581..331409.834 rows=964 loops=1)" " Filter: (count(*) > 1)" " -> Sort (cost=4173505.61..4198338.94 rows=9933329 width=118) (actual time=330101.487..330870.237 rows=196227 loops=1)" " Sort Key: ((p.tags -> keys.key))" " Sort Method: external merge Disk: 21960kB" " -> Nested Loop (cost=22534.58..574521.77 rows=9933329 width=118) (actual time=6747.314..327192.811 rows=196227 loops=1)" " Join Filter: ((p.tags -> keys.key) > ''::text)" " -> Seq Scan on planet_osm_point p (cost=0.00..5649.22 rows=182822 width=86) (actual time=0.015..516.191 rows=182822 loops=1)" " -> Materialize (cost=22534.58..22539.88 rows=163 width=32) (actual time=0.039..0.890 rows=420 loops=182822)" " -> Subquery Scan on keys (cost=22534.58..22539.07 rows=163 width=32) (actual time=6740.620..6743.358 rows=420 loops=1)" " -> HashAggregate (cost=22534.58..22537.44 rows=163 width=32) (actual time=6740.613..6741.651 rows=420 loops=1)" " Filter: (count(*) > 1)" " -> Subquery Scan on stat (cost=0.00..21417.62 rows=148929 width=32) (actual time=0.093..6230.640 rows=196458 loops=1)" " Filter: ((stat.key !~~ 'name%'::text) AND (stat.key !~~ 'addr:%'::text) AND (stat.key !~~* 'FIXME'::text) AND (stat.key !~~* 'openGeoDB:%'::text) AND (stat.key !~~ 'note:%'::text) AND (stat.key !~~ '%_ref'::text) AND (stat.key !~~ '%description%'::text) AND (stat.key !~~ 'contact:%'::text) AND (stat.key !~~ 'operator:%'::text) AND (stat.key !~~ 'uic_%'::text) AND (stat.key !~~ 'TMC%'::text) AND (stat.key !~~ 'uic_%'::text) AND (stat.key !~~ 'direction%'::text) AND (stat.key !~~ 'is_in%'::text) AND (stat.key !~~ 'wikipedia%'::text) AND (stat.key <> ALL ('{ele,ref,url,website,email,maxspeed,converted_by,layer,level,phone,mobility:station_id,information,opening_hours,date,time,collection_times,operator,colour,fee,nat_name,alt_name,population,seats,postal_code,capacity,line,lines,maxheight,reg_name}'::text[])))" " -> Seq Scan on planet_osm_point (cost=0.00..6106.28 rows=182822 width=86) (actual time=0.025..2363.603 rows=575700 loops=1)" "Total runtime: 331426.018 ms" 2011/5/1 Tom Lane <tgl@xxxxxxxxxxxxx>: > Stefan Keller <sfkeller@xxxxxxxxx> writes: >> Any ideas on how to index my hstore attribute? > > Use a GIST or GIN index. ÂThe only thing that a btree index on hstore > can do for you is to support equality comparisons on the whole hstore > value, which is pretty unlikely to be what you're after. > > Â Â Â Â Â Â Â Â Â Â Â Âregards, tom lane > -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general