Hello, I have been having a problem with the following query ignoring an index on the foos.bar column. SELECT c.id FROM foos c, bars r WHERE r.id != 0 AND r.modified_time > '2006-10-20 10:00:00.000' AND r.modified_time <= '2006-10-30 15:20:00.000' AND c.bar = r.id The bars table contains 597 rows, while the foos table contains 5031203 rows. After much research I figured out that the problem is being caused by the PG planner deciding that my foos.bar index is not useful. The data in the foos.bar column contains 5028698 0 values and 2505 that are ids in the bars table. Both tables have just been analyzed. When I EXPLAIN ANALYZE the above query, I get the following: "Hash Join (cost=3.06..201642.49 rows=25288 width=8) (actual time=0.234..40025.514 rows=11 loops=1)" " Hash Cond: ("outer".bar = "inner".id)" " -> Seq Scan on foos c (cost=0.00..176225.03 rows=5032303 width=16) (actual time=0.007..30838.623 rows=5031203 loops=1)" " -> Hash (cost=3.06..3.06 rows=3 width=8) (actual time=0.117..0.117 rows=20 loops=1)" " -> Index Scan using bars_index_modified_time on bars r (cost=0.00..3.06 rows=3 width=8) (actual time=0.016..0.066 rows=20 loops=1)" " Index Cond: ((modified_time > '2006-10-20 10:00:00'::timestamp without time zone) AND (modified_time <= '2006-10-30 15:20:00'::timestamp without time zone))" " Filter: (id <> 0)" "Total runtime: 40025.629 ms" The solution I found was to change the statistics on my foos.bar column from the default -1 to 1000. When I do this, reanalyze the table, and rerun the above query, I get the following expected result. "Nested Loop (cost=0.00..25194.66 rows=25282 width=8) (actual time=13.035..23.338 rows=11 loops=1)" " -> Index Scan using bars_index_modified_time on bars r (cost=0.00..3.06 rows=3 width=8) (actual time=0.063..0.115 rows=20 loops=1)" " Index Cond: ((modified_time > '2006-10-20 10:00:00'::timestamp without time zone) AND (modified_time <= '2006-10-30 15:20:00'::timestamp without time zone))" " Filter: (id <> 0)" " -> Index Scan using foos_index_bar on foos c (cost=0.00..6824.95 rows=125780 width=16) (actual time=1.141..1.152 rows=1 loops=20)" " Index Cond: (c.bar = "outer".id)" "Total runtime: 23.446 ms" Having to do this concerns me as I am not sure what a good statistics value should be. Also we expect this table to grow much larger and I am concerned that it may not continue to function correctly. I tried a value of 100 and that works when the number of bars records is small, but as soon as I increase them, the query starts ignoring the index again. Is increasing the statistics value the best way to resolve this problem? How can I best decide on a good statistics value? Having a column containing large numbers of null or 0 values seems fairly common. Is there way to tell Postgres to create an index of all values with meaning. Ie all non-0 values? None that I could find. Thanks in advance, Leif