Search Postgresql Archives

Re: surprising query optimisation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/12/2018 04:33, Stephen Frost wrote:
Greetings,

* Chris Withers (chris@xxxxxxxxxxx) wrote:
On 28/11/2018 22:49, Stephen Frost wrote:
* Chris Withers (chris@xxxxxxxxxxx) wrote:
We have an app that deals with a lot of queries, and we've been slowly
seeing performance issues emerge. We take a lot of free form queries from
users and stumbled upon a very surprising optimisation.

So, we have a 'state' column which is a 3 character string column with an
index on it. Despite being a string, this column is only used to store one
of three values: 'NEW', 'ACK', or 'RSV'.
Sounds like a horrible field to have an index on.
That's counter-intuitive for me. What leads you to say this and what would
you do/recommend instead?
For this, specifically, it's because you end up with exactly what you
have: a large index with tons of duplicate values.  Indexes are
particularly good when you have high-cardinality fields.  Now, if you
have a skewed index, where there's one popular value and a few much less
popular ones, then that's where you really want a partial index (as I
suggest earlier) so that queries against the non-popular value(s) is
able to use the index and the index is much smaller.

Of course, for this to work you need to set up the partial index
correctly and make sure that your queries end up using it.

Thanks!

Stephen

An index simply tells pg which block to look at (assuming that the index itself is not sufficient to satisfy the query), so if using an index would still require that pg look at most blocks, then it cheaper to just scan the whole table - rather than load the index and still look at all blocks that contain the table data.  I've oversimplified slightly.

An index is best used when using it results in fewer blocks being read from disk.

Also the use of RAM is better when the size of the index is kept small.  For example having an index on sex for nurses is a waste of time because most nurses are female.  However, a partial index (as suggested by Stephen, for your query) that includes only males could be useful if you have queries looking for male nurses (assumes male nurses are a very small fraction of nurses such that most data blocks don't have rows for males nurses, and the planner knows this).

I once optimised a very complex set queries that made extensive use of indexes.  However, with the knowledge I have today, I would have most likely had fewer and smaller indexes.  As I now realize, that some of my indexes were probably counter productive, especially as I'd given no thought to how much RAM would be required to host the data plus indexes!  Fortunately, while the objective was to run all those queries within 24 hours, they actually only took a couple of hours.

Chris, I would strongly suggest, you read up on the excellent documentation pg has about indexes, but don't expect to understand it all at one sitting...


Cheers,
Gavin





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux