Re: Fwd: Slow query from ~7M rows, joined to two tables of ~100 rows each

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 23, 2017 at 1:09 PM, Chris Wilson <chris+postgresql@xxxxxxxxx> wrote:

The records can already be read in order from idx_metric_value.... If this was selected as the primary table, and metric_pos was joined to it, then the output would also be in order, and no sort would be needed.

We should be able to use a merge join to metric_pos, because it can be read in order of id_metric (its primary key, and the first column in idx_metric_value...). If not, a hash join should be faster than a nested loop, if we only have to hash ~100 records.

Hash joins do not preserve order.  They could preserve the order of their "first" input, but only if the hash join is all run in one batch and doesn't spill to disk.  But the hash join code is never prepared to make a guarantee that it won't spill to disk, and so never considers it to preserve order.  It thinks it only needs to hash 100 rows, but it is never absolutely certain of that, until it actually executes.

If I set enable_sort to false, then I do get the merge join you want (but with asset_pos joined by nested loop index scan, not a hash join, for the reason just stated above) but that is slower than the plan with the sort in it, just like PostgreSQL thinks it will be.

If I vacuum your fact table, then it can switch to use index only scans.  I then get a different plan, still using a sort, which runs in 1.6 seconds.  Sorting is not the slow step you think it is.

Be warned that "explain (analyze)" can substantially slow down and distort this type of query, especially when sorting.  You should run "explain (analyze, timing off)" first, and then only trust "explain (analyze)" if the overall execution times between them are similar.

 
If I remove one of the joins (asset_pos) then I get a merge join between two indexes, as expected, but it has a materialize just before it which makes no sense to me. Why do we need to materialize here? And why materialise 100 rows into 1.5 million rows? (explain.depesz.com)


   ->  Materialize  (cost=0.14..4.89 rows=100 width=8) (actual time=0.018..228.265 rows=1504801 loops=1)
         Buffers: shared hit=2
         ->  Index Only Scan using idx_metric_pos_id_pos on metric_pos  (cost=0.14..4.64 rows=100 width=8) (actual time=0.013..0.133 rows=100 loops=1)
               Heap Fetches: 100
               Buffers: shared hit=2

It doesn't need to materialize, it does it simply because it thinks it will be faster (which it is, slightly).  You can prevent it from doing so by set enable_materialize to off.  The reason it is faster is that with the materialize, it can check all the visibility filters at once, rather than having to do it repeatedly.  It is only materializing 100 rows, the 1504801 comes from the number of rows the projected out of the materialized table (one for each row in the other side of the join, in this case), rather than the number of rows contained within it.

And again, vacuum your tables.  Heap fetches aren't cheap.


The size of the result set is approximately 91 MB (measured with psql -c | wc -c). Why does it take 4 seconds to transfer this much data over a UNIX socket on the same box?

It has to convert the data to a format used for the wire protocol (hardware independent, and able to support user defined and composite types), and then back again.

work_mem = 100MB

Can you give it more than that?  How many simultaneous connections do you expect?

Cheers,

Jeff


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux