Shards + hash = forever running queries

Daniele Varrazzo <daniele.varrazzo@xxxxxxxxx> · Mon, 23 Jul 2012 11:03:11 +0100

Hello,

We are using Postgres 9.1.4. We are struggling with a class of queries
that got impossible to run after sharding a large table. Everything
like:

    select small.something, big.anything
    from small join big on small.big_id = big.id;

and variation such as "select * from big where id in (select big_id from small)"

Since "big" was sharded, the query plan results in something like:

 Hash Join  (cost=10000000001.23..30038997974.72 rows=10 width=753)
   Hash Cond: (b.id = i.big_id)
   ->  Append  (cost=0.00..20038552251.23 rows=118859245 width=11)
         ->  Index Scan using big_201207_pkey on big_201207 b
(cost=0.00..2224100.46 rows=1609634 width=12)
         ->  Index Scan using big_201101_pkey on big_201101 b
(cost=0.00..404899.71 rows=5437497 width=12)
         ->  Index Scan using big_201104_pkey on big_201104 b
(cost=0.00..349657.58 rows=4625181 width=12)
         -> [...all the shards]
   ->  Hash  (cost=10000000001.10..10000000001.10 rows=10 width=742)
         ->  Seq Scan on small i  (cost=10000000000.00..10000000001.10
rows=10 width=742)

Postgres ends up in never-ending reads: even if "small" has only three
rows I've never seen such query finishing, the time passed being even
longer than a full scan on big.

The plan looks sub-optimal, as it seems it first does a huge indexscan
of all the partitions, then it joins the result against a small hash.

1. Can we fix the queries to work around this problem?

2. Could the planner be fixed for this scenario for PG 9.2 (or 9.3)?
Creating the hash beforehand, performing an hash join for each
partition and merging the results looks like it would bring it back
into the realm of the runnable queries. Am I wrong?

Thank you very much.

-- Daniele

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance