Marc MILLAS
On Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxxx> wrote:
On 6/21/23 00:26, Marc Millas wrote:
>
>
> On Tue, Jun 20, 2023 at 11:19 PM David Rowley <dgrowleyml@xxxxxxxxx
> <mailto:dgrowleyml@xxxxxxxxx>> wrote:
>
> On Wed, 21 Jun 2023 at 08:34, Marc Millas <marc.millas@xxxxxxxxxx
> <mailto:marc.millas@xxxxxxxxxx>> wrote:
> >
> > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
> <dgrowleyml@xxxxxxxxx <mailto:dgrowleyml@xxxxxxxxx>> wrote:
> >>
> >> On Wed, 21 Jun 2023 at 07:42, Marc Millas <marc.millas@xxxxxxxxxx
> <mailto:marc.millas@xxxxxxxxxx>> wrote:
> >> > But if I do the same with clause one OR clause 2, I have to
> kill the request after an hour, seeing the filesystem showing more
> than 140 Mb of increased usage.
It's a bit weird the "victor" table is joined seemingly without any join
conditions, leading to a cross join (which massively inflates the cost
for joins above it). Maybe the anonymized plan mangles it somehow.
So I did try to simplify my pb.
I create a table with the result of the first 3 joins.
That table do have 15M lines. all tables have been vacuum analyze
Now if I do an explain analyze of a simple join between that table and my original table 4
using a simple = clause, I get a result in one second (around). and the planner guesses for rows seems in line with the observed values .
if I use a substr(table1.a)= table2.b, the explain analyze get a result in 21 seconds and the planner estimates a 65M rows result set while the observed is 330 k rows
so here its 20 times slower and the discrepency between planner rows guess and reality is a 200 ratio.
Now, if I try an explain analyze with join on a=b or substr(c)=d or e=f
then... I kill the query after a quarter an hour without any answer.
if I try to just explain the query, the planner rows guess becomes more than 2 Billions....
the extremely simple query and plan are here, without automatic obfuscation
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Marc MILLAS