Re: Query planner riddle (array-related?)

Tom Lane <tgl@xxxxxxxxxxxxx> · Mon, 07 May 2018 10:17:20 -0400

Markus <m@xxxxxxx> writes:
> Ah... yeah, the parallax distribution is fairly sharply peaked around
> 0, so >50 might be severely off.
> So, I've run
>   alter table gaia.dr2light alter parallax set statistics 1000;
>   analyze gaia.dr2light;
> With this, the query plans converge to trivial variations of

>  Hash Join  (cost=253856.23..4775113.84 rows=422 width=1647) (actual time=1967.095..2733.109 rows=18 loops=1)
>    Hash Cond: (dr2light.source_id = dr2epochflux.source_id)
>    ->  Bitmap Heap Scan on dr2light  (cost=24286.88..4385601.28 rows=1297329 width=132) (actual time=3.113..19.346 rows=5400 loops=1)
>    ->  Hash  (cost=118285.38..118285.38 rows=551038 width=1523) (actual time=1885.177..1885.177 rows=550737 loops=1)

> While that's a reasonable plan and fast enough, I'd still like to
> keep the box from seqscanning dr2epochflux with its large arrays and
> use that table's index on source_id.  If I read the query plan right,
> part of the problem is that it still massively overestimates the
> result of parallax>50 (1297329 rather than 5400).  Is there anything
> I can do to improve that estimate?

Raise the parallax stats target still higher, perhaps.  I think we let you
set it as high as 10000.

> But even with that suboptimal estimate, postgres, under the
> assumption of something not too far from a uniform distribution on
> source_id, should end up estimating the cardinality of the end result
> as something like
> (selectivity on dr2light)*(cardinality of dr2epochflux),
> and hence roughly (1297329/1.6e9*5e5)=405 rows to be drawn from
> dr2epochflux.  It would seem a lot smarter to just pull these few 1e2
> rows using the source_id index on dr2epochflux than seqscanning that
> table, no?

No.  Given the above estimates, it would have to do 1297329 index lookups
in dr2epochflux, which is not going to be a win compared to 1297329 probes
into an in-memory hash table.  Even with a dead-accurate estimate of 5400
dr2light rows to be joined, I don't think an inner indexscan is
necessarily a better plan than a hash.  It's the number of probes that
matter, not the number of successful probes.

(It's not clear to me why so few of the dr2light rows have join partners,
but the planner does seem to understand that most of them don't.)

You say you're worried about "large arrays" in dr2epochflux; but if they
are large enough to be toasted out-of-line, it's really a nonissue.  Only
the toast pointers would be read during the seqscan or stored in the hash.

			regards, tom lane