Re: query of partitioned object doesnt use index in qa

Mike Broers <mbroers@xxxxxxxxx> · Fri, 15 Sep 2017 15:59:17 -0500

That makes a lot of sense, thanks for taking a look.  An index like you suggest would probably further improve the query.   Is that suggestion sidestepping the original problem that production is evaluating the landing_id bit with the partition index and qa is sequence scanning instead?
AND exists (select 1 from t_sap where e.landing_id = t_sap.landing_id)) as rankings; 

Based on the difference in row estimate I am attempting an analyze with a higher default_statistic_target (currently 100) to see if that helps.

On Fri, Sep 15, 2017 at 3:42 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Mike Broers <mbroers@xxxxxxxxx> writes:

> If Im reading this correctly postgres thinks the partition will return 6.5

> million matching rows but actually comes back with 162k.  Is this a case

> where something is wrong with the analyze job?

You've got a lot of scans there that're using conditions like

> │                           ->  Seq Scan on event__99999999 e_1 (cost=0.00..2527828.05 rows=11383021 width=778) (actual time=25522.389..747238.885 rows=42 loops=1)

> │                                 Filter: (((body ->> 'SID'::text) IS NOT NULL) AND (validation_status_code = 'P'::bpchar))

> │                                 Rows Removed by Filter: 12172186

While I'd expect the planner to be pretty solid on estimating the

validation_status_code condition, it's not going to have any idea about

that JSON field test.  That's apparently very selective, but you're just

getting a default estimate, which is not going to think that a NOT NULL

test will exclude lots of rows.

One thing you could consider doing about this is creating an index

on (body ->> 'SID'::text), which would prompt ANALYZE to gather statistics

about that _expression_.  Even if the index weren't actually used in the

plan, this might improve the estimates and the resulting planning choices

enough to make it worth maintaining such an index.

Or you could think about pulling that field out and storing it on its own.

JSON columns are great for storing random unstructured data, but they are

less great when you want to do relational-ish things on subfields.

                        regards, tom lane