Re: Query that took a lot of time in Postgresql when not using trim in order by

"Peter J. Holzer" <hjp-pgsql@xxxxxx> · Sun, 29 Nov 2015 14:23:23 +0100

On 2015-11-25 19:35:15 +0300, Evgeniy Shishkin wrote:
> Fast:
> 
>  Sort  (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)
>    Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, (btrim((dim_cliente.tipocliente)::text))
>    Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome
>    Sort Method: quicksort  Memory: 13121kB
>    ->  HashAggregate  (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1)
>          Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, btrim((dim_cliente.tipocliente)::text)
>          ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1)
> 
> 
> Slow:
> 
>  Group  (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
>    Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
>    ->  Sort  (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
>          Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
>          Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
>          Sort Method: quicksort  Memory: 265592kB
>          ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1)
> 
> 
> The difference is in the top of plans.
> As we see, hashjoin time is practically the same. 
> But fast plan uses hashagg first and only 43k rows require sorting.
> Slow plan dominated by sorting 900k rows.
> 
> I wonder if increasing cpu_tuple_cost will help.
> As cost difference between two plans is negligible now.

Seems plausible. Also I'm wondering what CPU this is: 36 seconds for an
in-memory sort of 900k rows seems slow to me. I tested this on my PC at
home (1.8 GHz Core2 Dual, so a rather old and slow box) and I could sort
1E6 rows of 128 random bytes in 5.6 seconds. Even if I kept the first 96
bytes constant (so only the last 32 were random), it took only 21
seconds. Either this CPU is really slow or the data is heavily skewed -
is it possible that all dimensions except dim_vendedor.a3_nome have only
one or very few values? In that case changing the sort order might help.

	hp

-- 
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants 
| |   | hjp@xxxxxx         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/
Attachment:
signature.asc

Description: Digital signature