Re: optimizing a query

Jonathan Vanasco <postgres@xxxxxxxx> · Wed, 22 Jun 2016 18:27:58 -0400

On Jun 22, 2016, at 2:38 PM, David G. Johnston wrote:
> What query?  A self-contained email would be nice.

This was the same query as in the previous email in the thread.  I didn't think to repeat it.  I did include it below.

> https://www.postgresql.org/docs/9.6/static/indexes-index-only-scans.html
> 
> Note especially:
> 
> "Visibility information is not stored in index entries, only in heap entries; ..."
> 
> The check against the heap isn't for the truthiness of the predicate but the visibility of the row.

Thanks for this link. 

The table I worked on hasn't had any writes since a server restart, and according to those docs the queries should have been off the visibility map not the heap.  
However the amount of time to search is not in line with expectations for the visibility map. 

After reading the last paragraph about some index optimizations in 9.6 that looked related, I installed the RC on an another machine and dumped 2 tables from production to see if I would qualify for any improvements.  

>>> But there's a problem: the WHERE clause refers to success which is not available as a result column of the index. Nonetheless, an index-only scan is possible because the plan does not need to recheck that part of the WHERE clause at runtime: all entries found in the index necessarily have success = true so this need not be explicitly checked in the plan. PostgreSQL versions 9.6 and later will recognize such cases and allow index-only scans to be generated, but older versions will not.

The 9.6 branch planner optimizes for my query and realizes that it doesn't need to check the table:

So while this index is necessary on 9.5:
	CREATE INDEX idx__9_5 ON table_a(column_1, id, column_2) WHERE column_2 IS NOT FALSE;

This index works on 9.6
	CREATE INDEX idx__9_6 ON table_a(column_1, id) WHERE column_2 IS NOT FALSE;

Considering I have several million rows, this has a noticeable effect .

Combined with the various improvements on 9.6, there is a huge difference in query speed:

	9.6 runs the query with the smaller index in an average of 1200ms
	9.5 runs the query with the larger index in an average of 2700ms

> This one requires knowledge of the query; but I am not surprised that reversing the order of columns in a b-tree index has an impact.

I expected this to impact the decision on which index to use when multiple ones are available, or to offer poor performance -- but not to discount using the index entirely.

> All at once?

No.  I dropped all indexes to test, then for each column combination did:

	CREATE INDEX foo_idx;
        ANALYZE foo ;
	EXPLAIN ANALYZE; 
	DROP INDEX foo_idx;

I call Explain Analyze manually once for the plan, then via script 25x to average out execution times and account for cold-start vs having loaded all the indexes.  I shut down all other user processes on the machine as well.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general