Hello, > > > > > > From: pgsql-performance-owner@xxxxxxxxxxxxxx [mailto:pgsql-performance-owner@xxxxxxxxxxxxxx] On Behalf Of Mike Broers > Sent: Dienstag, 7. Juli 2015 18:28 > To: Tom Lane > Cc: pgsql-performance@xxxxxxxxxxxxxx > Subject: Re: wildcard text filter switched to boolean column, performance is way worse > > After bumping up work_mem from 12MB to 25MB that last materialize is indeed hashing and this cut the query time by about 60%. Thanks, this was very helpful and gives me something else to look for when troubleshooting explains. > > > > On Tue, Jul 7, 2015 at 11:10 AM, Mike Broers <mbroers@xxxxxxxxx> wrote: > Thanks, very informative! I'll experiment with work_mem settings and report back. > > On Tue, Jul 7, 2015 at 11:02 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Mike Broers <mbroers@xxxxxxxxx> writes: > > I had a query that was filtering with a wildcard search of a text field for > > %SUCCESS%. The query took about 5 seconds and was running often so I wanted > > to improve it. I suggested that the engineers include a new boolean column > > for successful status. They implemented the requested field, but the query > > that filters on that new column runs very long (i kill it after letting it > > run for about an hour). Can someone help me understand why that is the > > case and how to resolve it? > > It's hashing the subplan output in the first case and not the second: > > > Seq Scan on lead (cost=130951.81..158059.21 rows=139957 width=369) (actual > > time=4699.619..4699.869 rows=1 loops=1) > > Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <> > > ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (hashed SubPlan 3)))) > ^^^^^^^^^^^^^^^^ > vs > > > Seq Scan on lead (cost=85775.78..9005687281.12 rows=139957 width=369) > > Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <> > > ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (SubPlan 3)))) > ^^^^^^^^^ > > Presumably, the new more-accurate rows count causes the planner to realize > that the hash table will exceed work_mem so it doesn't choose to hash ... > but for your situation, you'd rather it did, because what you're getting > instead is a Materialize node that spills to disk (again, because the data > involved exceeds work_mem) and that's a killer for this query. You should > be able to get back the old behavior if you raise work_mem enough. > > Another idea you might think about is changing the OR'd IN conditions > to a single IN over a UNION ALL of the subselects. I'm not really sure if > that would produce a better plan, but it's worth trying if it wouldn't > require too much app-side contortion. Hello, you might try to use a CTE to first collect the IDs to exclude, and join them to your main table. This should result in an anti join pattern. Something like: WITH IDS as ( SELECT U1."lead_id" AS "lead_id" FROM "event" U1 WHERE U1."event_type" ='type_1' UNION ( SELECT U1."lead_id" AS "lead_id" FROM "event" U1 WHERE U1."event_type" = 'type_2' INTERSECT SELECT U1."lead_id" AS "lead_id" FROM "event" U1 WHERE successful ) ) SELECT * FROM lead LEFT OUTER JOIN IDS ON (lead.id=IDS.lead_id) WHERE IDS.lead_id IS NULL; regards, Marc Mamin > regards, tom lane > > > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance