>>>>> "Jung" == Jung, Jinho <jinho.jung@xxxxxxxxxx> writes: Jung> select distinct Jung> ref_0.i_im_id as c0, Jung> ref_1.ol_dist_info as c1 Jung> from Jung> public.item as ref_0 right join Jung> public.order_line as ref_1 Jung> on (ref_0.i_id = 5) Jung> - Commit: 84f9a35 (Improve estimate of distinct values in estimate_num_groups()) Jung> - Our analysis: We believe that this regression is related to the Jung> new logic for estimating the number of distinct values in the Jung> optimizer. This is affecting even queries with point lookups Jung> (ref_0.i_id = 5) in the TPC-C benchmark. So what's happening here is that the old plan was mis-estimating the result, believed incorrectly that it would fit into work_mem, and generated a hashaggregate plan accordingly; it ran fast because hashaggregate doesn't spill to disk but silently overflows work_mem. The new plan correctly estimates the result size, and therefore is forbidden from generating the hashaggregate plan at the default work_mem setting; it generates a sort plan, and the sort of course spills to disk since work_mem is exceeded. Had the value of work_mem been set to something appropriate for the workload, then the query plan would not have changed. So the problem (from an automated testing perspective) is that an actual _improvement_ in the code is being reported as a regression. Jung> ####### QUERY 3: Jung> select Jung> cast(ref_1.ol_i_id as int4) as c0 Jung> from Jung> public.stock as ref_0 Jung> left join public.order_line as ref_1 Jung> on (ref_1.ol_number is not null) Jung> where ref_1.ol_number is null Jung> - Commit: 77cd477 (Enable parallel query by default.) Jung> - Our analysis: We believe that this regression is due to Jung> parallel queries being enabled by default. Surprisingly, we found Jung> that even on a larger TPC-C database (scale factor of 50, roughly Jung> 4GB), parallel scan is still slower than the non-parallel one in Jung> the old version, when the query is not returning any tuples. The problem here is not actually with parallel scans as such, but rather the omission of a Materialize node in the parallel plan, and what looks like some rather serious mis-costing of the nestloop antijoin. Jung> ####### QUERY 4: Jung> select Jung> ref_0.s_dist_06 as c0 Jung> from Jung> public.stock as ref_0 Jung> where (ref_0.s_w_id < cast(least(0, 1) as int8)) Jung> - Commit: 5edc63b (Account for the effect of lossy pages when costing bitmap scans) Jung> - Our analysis: We believe that this regression has to do with Jung> two factors: 1) conditional expression (e.g., LEAST or NULLIF) Jung> are not reduced to constants unlike string functions (e.g., Jung> CHAR_LENGTH) 2) change in the cost estimation function for bitmap Jung> scan. Execution time grows by 3 orders of magnitude. We note that Jung> this regression is only observed on large databases (e.g., scale Jung> factor of 50). Again, this is showing up because of a large database and a small work_mem. The bitmap scan on stock only becomes lossy if the number of rows matched in the index is very large relative to work_mem; the lack of plan-time evaluation of LEAST means that the planner doesn't have any good way to estimate the selectivity, so it's taking a default estimate. Jung> ####### QUERY 1: Jung> select Jung> ref_0.o_d_id as c0 Jung> from Jung> public.oorder as ref_0 Jung> where EXISTS ( Jung> select Jung> 1 Jung> from Jung> (select distinct Jung> ref_0.o_entry_d as c0, Jung> ref_1.c_credit as c1 Jung> from Jung> public.customer as ref_1 Jung> where (false) Jung> ) as subq_1 Jung> ); Jung> - Commit: bf6c614 (Do execGrouping.c via expression eval machinery, take two) Jung> - Our analysis: We are not sure about the root cause of this Jung> regression. This might have to do with grouping logic. What this query is basically exercising is how fast one can do ExecReScan on a DISTINCT query, without also considering the performance effects of actually doing the grouping (the constant-false qual here means that the grouping comparison is never actually performed). An optimization tradeoff that speeds up comparisons within a scan at the cost of a fixed overhead for the scan will therefore make this query slower, but it still seems a good tradeoff to make (of course it would be even better to make the overhead per-query rather than per-scan, and there were other issues with this commit that should have been caught at the time). -- Andrew (irc:RhodiumToad)