Re: optimizing a query

"David G. Johnston" <david.g.johnston@xxxxxxxxx> · Tue, 21 Jun 2016 18:55:51 -0400

On Tue, Jun 21, 2016 at 6:44 PM, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 06/21/2016 03:33 PM, Jonathan Vanasco wrote:

In effort of simplifying the work, I've created indexes on t_a that have all the related columns.

        CREATE INDEX test_idx ON t_a(col_1, id) WHERE col_2 IS NOT FALSE;

        CREATE INDEX test_idx__a ON t_a(col_1, id) WHERE col_2 IS NOT FALSE;

Aside from the name these indexes are identical...

postgres will query test_idx__a first (yay!) but then does a bitmap heap scan on t_a, and uses the raw t_a for the hash join.

I don't actually need any information from t_a - it's just there for the filtering, and ideally postgres would just use the index.

This is the description of a semi-join.

WHERE EXISTS (SELECT 1 FROM t_a WHERE t_a.id = t_a2b.a_id AND  t_a.col_1 = 730 AND t_a.col_2 IS NOT FALSE)

I thought this might have been from using a partial index, but the same results happen with a full index.  I just can't seem to avoid this hash join against the full table.

anyone have a suggestion?

The below works without including t_a in the FROM?

example query

        SELECT t_a2b.b_id AS b_id,

                   count(t_a2b.b_id) AS counted

        FROM t_a2b

        WHERE

                  t_a2b.col_a = 1

                  AND

                  t_a.col_1 = 730

                  AND

                  t_a.col_2 IS NOT False

        GROUP BY t_a2b.b_id

        ORDER BY        counted DESC,

                                t_a2b.b_id ASC

These two items combined reduce the desirability of diagnosing this...it doesn't seem like you've faithfully recreated the scenario for us to evaluate.

Your post is also not self-contained and you haven't provided the actual EXPLAINs you are getting.

David J.