I understand the value of indexes and of ANALYZE for the efficient use of them. In the following statement, you can see that the index scan is being used. Even though it takes 80 seconds (for a 1.25 million row table), it is much faster than without the index. But, if I repeat this search, it speeds up by more than a factor of 2! I love it, but I'd sure like to understand why. When I do it a third time, it speeds up again. A fourth try does not speed it up more. Is this speedup due to some memory/disk buffering from which I'm benefiting? I'm using linux (actually under VMware on WinXP, so it's even less efficient that it could be on it's own). Or is the planner learning something from previous runs of this search? It appears not, since the rows it thinks it needs to search are the same in the EXPLAIN ANALYZE outputs below. Can someone help me understand why my searches are speeding up so I can make it happen the first time, if possible?
Thanks, TJ
Try #1:
Select distinct smiles from structure where (_c, _o, _arom_c, _c_double, _o_double, _n, _ring_c, _ring_hetero, _halogen, _n_double, _arom_n, _s, _s_double, _other_atoms, _c_triple, _n_triple, _p, _arom_s, _p_double, _arom_o) >= (4,2,6,2,2,1,4,1,0,0,0,0,0,0,0,0,0,0,0,0) and oe_matches(smiles,'c1ccc(cc1)C2CC(=O)NC2=O') limit 500
Limit (cost=92649.53..92652.35 rows=500 width=49) (actual time=81544.566..81545.522 rows=117 loops=1)
-> Unique (cost=92649.53..92688.60 rows=6924 width=49) (actual time=81544.561..81545.174 rows=117 loops=1)
-> Sort (cost=92649.53..92669.06 rows=7813 width=49) (actual time=81544.553..81544.726 rows=117 loops=1)
Sort Key: smiles
-> Index Scan using fingerprint on structure (cost=0.00..92144.36 rows=7813 width=49) (actual time=36.179..81533.872 rows=117 loops=1)
Index Cond: ((_c >= 4) AND (_o >= 2) AND (_arom_c >= 6) AND (_c_double >= 2) AND (_o_double >= 2) AND (_n >= 1) AND (_ring_c >= 4) AND (_ring_hetero >= 1) AND (_halogen >= 0) AND (_n_double >= 0) AND (_arom_n >= 0) AND (_s >= 0) AND (_s_double >= 0) AND (_other_atoms >= 0) AND (_c_triple >= 0) AND (_n_triple >= 0) AND (_p >= 0) AND (_arom_s >= 0) AND (_p_double >= 0) AND (_arom_o >= 0))
Filter: oe_matches(smiles, 'c1ccc(cc1)C2CC(=O)NC2=O'::character varying)
Total runtime: 81545.903 ms
Try #2:
Limit (cost=92649.53..92652.35 rows=500 width=49) (actual time=36924.436..36925.450 rows=117 loops=1)
-> Unique (cost=92649.53..92688.60 rows=6924 width=49) (actual time=36924.431..36925.051 rows=117 loops=1)
-> Sort (cost=92649.53..92669.06 rows=7813 width=49) (actual time=36924.423..36924.596 rows=117 loops=1)
Sort Key: smiles
-> Index Scan using fingerprint on structure (cost=0.00..92144.36 rows=7813 width=49) (actual time=14.591..36891.589 rows=117 loops=1)
Index Cond: ((_c >= 4) AND (_o >= 2) AND (_arom_c >= 6) AND (_c_double >= 2) AND (_o_double >= 2) AND (_n >= 1) AND (_ring_c >= 4) AND (_ring_hetero >= 1) AND (_halogen >= 0) AND (_n_double >= 0) AND (_arom_n >= 0) AND (_s >= 0) AND (_s_double >= 0) AND (_other_atoms >= 0) AND (_c_triple >= 0) AND (_n_triple >= 0) AND (_p >= 0) AND (_arom_s >= 0) AND (_p_double >= 0) AND (_arom_o >= 0))
Filter: oe_matches(smiles, 'c1ccc(cc1)C2CC(=O)NC2=O'::character varying)
Total runtime: 36925.820 ms
Try #3:
Limit (cost=92649.53..92652.35 rows=500 width=49) (actual time=23712.435..23713.394 rows=117 loops=1)
-> Unique (cost=92649.53..92688.60 rows=6924 width=49) (actual time=23712.430..23713.046 rows=117 loops=1)
-> Sort (cost=92649.53..92669.06 rows=7813 width=49) (actual time=23712.422..23712.599 rows=117 loops=1)
Sort Key: smiles
-> Index Scan using fingerprint on structure (cost=0.00..92144.36 rows=7813 width=49) (actual time=17.548..23631.915 rows=117 loops=1)
Index Cond: ((_c >= 4) AND (_o >= 2) AND (_arom_c >= 6) AND (_c_double >= 2) AND (_o_double >= 2) AND (_n >= 1) AND (_ring_c >= 4) AND (_ring_hetero >= 1) AND (_halogen >= 0) AND (_n_double >= 0) AND (_arom_n >= 0) AND (_s >= 0) AND (_s_double >= 0) AND (_other_atoms >= 0) AND (_c_triple >= 0) AND (_n_triple >= 0) AND (_p >= 0) AND (_arom_s >= 0) AND (_p_double >= 0) AND (_arom_o >= 0))
Filter: oe_matches(smiles, 'c1ccc(cc1)C2CC(=O)NC2=O'::character varying)
Total runtime: 23713.765 ms
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster