Mike Broers <mbroers@xxxxxxxxx> writes: > I had a query that was filtering with a wildcard search of a text field for > %SUCCESS%. The query took about 5 seconds and was running often so I wanted > to improve it. I suggested that the engineers include a new boolean column > for successful status. They implemented the requested field, but the query > that filters on that new column runs very long (i kill it after letting it > run for about an hour). Can someone help me understand why that is the > case and how to resolve it? It's hashing the subplan output in the first case and not the second: > Seq Scan on lead (cost=130951.81..158059.21 rows=139957 width=369) (actual > time=4699.619..4699.869 rows=1 loops=1) > Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <> > ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (hashed SubPlan 3)))) ^^^^^^^^^^^^^^^^ vs > Seq Scan on lead (cost=85775.78..9005687281.12 rows=139957 width=369) > Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <> > ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (SubPlan 3)))) ^^^^^^^^^ Presumably, the new more-accurate rows count causes the planner to realize that the hash table will exceed work_mem so it doesn't choose to hash ... but for your situation, you'd rather it did, because what you're getting instead is a Materialize node that spills to disk (again, because the data involved exceeds work_mem) and that's a killer for this query. You should be able to get back the old behavior if you raise work_mem enough. Another idea you might think about is changing the OR'd IN conditions to a single IN over a UNION ALL of the subselects. I'm not really sure if that would produce a better plan, but it's worth trying if it wouldn't require too much app-side contortion. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance