Hi
Thanks for the response. I reran the query but first ran the statement you provided and set working mem to 2gb. It ended up taking 133s and group aggregate was still used
Here are the values you asked for:
# - Planner Method Configuration -
#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on
# - Planner Cost Constants -
#seq_page_cost = 1.0 # measured on an arbitrary scale
#random_page_cost = 4.0 # same scale as above
#cpu_tuple_cost = 0.01 # same scale as above
#cpu_index_tuple_cost = 0.005 # same scale as above
#cpu_operator_cost = 0.0025 # same scale as above
#effective_cache_size = 6000MB
The output of select * from pg_statistics is large...should I attach it as a separate file (not sure if that's allowed on these mailing lists)
The data is ~2.5gb, I can't think of any place I can upload it. I can provide the columns and data type. it's a subset of public data from usaspending.gov
column_name, datatype, ordinal position, nullable?
idx | integer | 1 | YES |
obligatedamount | double precision | 2 | YES |
baseandexercisedoptionsvalue | double precision | 3 | YES |
baseandalloptionsvalue | double precision | 4 | YES |
maj_fund_agency_cat | character varying | 5 | YES |
contractingofficeagencyid | character varying | 6 | YES |
contractingofficeid | character varying | 7 | YES |
fundingrequestingagencyid | character varying | 8 | YES |
fundingrequestingofficeid | character varying | 9 | YES |
signeddate | date | 10 | YES |
effectivedate | date | 11 | YES |
currentcompletiondate | date | 12 | YES |
ultimatecompletiondate | date | 13 | YES |
lastdatetoorder | character varying | 14 | YES |
typeofcontractpricing | character varying | 15 | YES |
multiyearcontract | character varying | 16 | YES |
vendorname | character varying | 17 | YES |
dunsnumber | character varying | 18 | YES |
parentdunsnumber | character varying | 19 | YES |
psc_cat | character varying | 20 | YES |
productorservicecode | character varying | 21 | YES |
principalnaicscode | character varying | 22 | YES |
piid | character varying | 23 | YES |
modnumber | character varying | 24 | YES |
fiscal_year | character varying | 25 | YES |
idvpiid | character varying | 26 | YES |
extentcompeted | character varying | 27 | YES |
numberofoffersreceived | double precision | 28 | YES |
competitiveprocedures | character varying | 29 | YES |
solicitationprocedures | character varying | 30 | YES |
evaluatedpreference | character varying | 31 | YES |
firm8aflag | character varying | 32 | YES |
sdbflag | character varying | 33 | YES |
issbacertifiedsmalldisadvantagedbusiness | character varying | 34 | YES |
womenownedflag | character varying | 35 | YES |
veteranownedflag | character varying | 36 | YES |
minorityownedbusinessflag | character varying | 37 | YES |
data_source | text | 38 | YES |
psc_cd | character varying | 39 | YES |
On Mon, Aug 26, 2013 at 9:40 AM, Tomas Vondra <tv@xxxxxxxx> wrote:
On 26 Srpen 2013, 15:02, Adam Ma'ruf wrote:Hi,
> Sure
>
> I just upgraded to 9.2.4. The query is:
> SELECT quebec_four
> , sierra
> , SUM(dollaramount) as dollaramount
> FROM alpha_quebec_echo
> GROUP BY quebec_four
> , sierra
>
> alpha_quebec_echo has 5,409,743 rows and 39 columns. Quebec_four and
> sierra are both varchar, dollar amount is a floating point field. It has
> no indexes (but neither did the mssql express table). Any other details
> you need?
>
> Thanks,
> A
It's quite clear why the query is so slow - the plan is using on-disk sort
with ~5M rows, and that's consuming a lot of time (almost 120 seconds).
I'm wondering why it chose the sort in the first place. I'd guess it'll
choose hash aggregate, which does not require sorted input.
Can you try running "set enable_sort = false" and then explain of the query?
If that does not change the plan to "HashAggregate" instead of
"GroupAggregate", please check and post values of enable_* and cost_*
variables.
Another question is why it's doing the sort on disk and not in memory. The
explain you've posted shows it requires ~430MB on disk, and in my
experience it usually requires ~3x that much to do the in-memory sort.
I see you've set work_mem=4GB, is that correct? Can you try with a lower
value - say, 1 or 2GB? I'm not sure how this works on Windows, though.
Maybe there's some other limit (and SQL Server is not hitting it, because
it's native Windows application).
Can you prepare a testcase (table structure + data) and post it somewhere?
Or at least the structure, if it's not possible to share the data.
Also, output from "select * from pg_settings" would be helpful.
Tomas