Slow index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi all. I'm having an interesting time performance-wise with a set of indexes. Any clues as to what is going on or tips to fix it would be appreciated.

My application runs lots of queries along the lines of:

SELECT * from table where field IN (.., .., ..);

There is always an index on the field in the table, but the table is not necessarily clustered on the index.

There are six different indexes which the application hits quite hard, that I have investigated. These are:

gene__key_primaryidentifier      (index size 20MB)  (table size 72MB)
gene__key_secondaryidentifier    (index size 20MB)  (table size 72MB)
ontologyterm__key_name_ontology  (index size 2.5MB) (table size 10MB)
protein__key_primaryacc          (index size 359MB) (table size 1.2GB)
publication__key_pubmed          (index size 12MB)  (table size 48MB)
synonym__key_synonym             (index size 3GB)   (table size 3.5GB)

These six indexes all perform very differently.

These are the results from a few thousand queries on each index, from our application logs. Generally, the same value is not used more than once in all the queries.

                                 (1)   (2)   (3)   (4)    (5)
gene__key_primaryidentifier      22    17    417   19     24.5
gene__key_secondaryidentifier    8.5   5.3   21    2.4    3.9
ontologyterm__key_name_ontology  6.5   6.5   9.4   1.4    1.4
protein__key_primaryacc          73    8.1   164   2.2    20
publication__key_pubmed          52    31    156   3.0    5.0
synonym__key_synonym             335   66    245   0.7    3.7

(1) - Average number of values in the IN list.
(2) - Average number of rows returned by the queries.
(3) - Average time taken to execute the query, in ms.
(4) - Average time per value in the IN lists.
(5) - Average time per row returned.

All the queries are answered with a bitmap index scan on the correct index.

I have also plotted all the log entries on an XY graph, with number of elements in the IN list against time taken, which is at http://wakeling.homeip.net/~mnw21/slow_index1.png. It is clear that the gene__key_primaryidentifier index runs a lot slower than some of the other indexes.

The thing is, the table and the index are both small. The machine has 16GB of RAM, and its disc subsystem is a RAID array of 16 15krpm drives with a BBU caching RAID controller. The entire table and index should be in the cache. Why it is taking 20 milliseconds per value is beyond me. Moreover, the synonym index is MUCH larger, has bigger queries, and performs better.

If we concentrate on just this index, it seems that some queries are answered very quickly indeed, while others are answered a lot slower. I have plotted just this one index on an XY graph, with two colours for values in the IN list and actual rows returned, which is at http://wakeling.homeip.net/~mnw21/slow_index2.png. It is clear that there is a gap in the graph between the slow queries and the fast queries.

Is there something else going on here which is slowing the system down? The table is not bloated. There is quite heavy simultaneous write traffic, but little other read traffic, and the 16 spindles and BBU cache should take care of that quite happily. I don't think it's slow parsing the query, as it seems to manage on other queries in a millisecond or less.

Any ideas welcome.

Also, the mailing list server doesn't seem to be able to cope with image attachments.

Matthew

--
import oz.wizards.Magic;
  if (Magic.guessRight())...           -- Computer Science Lecturer


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux