Re: Poor performance on simple queries compared to sql server express

"Adam Ma'ruf" <adam.maruf@xxxxxxxxx> · Tue, 27 Aug 2013 00:06:39 -0400

Hi
Thanks for the response.  I reran the query but first ran the statement you provided and set working mem to 2gb.  It ended up taking 133s and group aggregate was still used

Here are the values you asked for:
# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#seq_page_cost = 1.0			# measured on an arbitrary scale
#random_page_cost = 4.0			# same scale as above
#cpu_tuple_cost = 0.01			# same scale as above
#cpu_index_tuple_cost = 0.005		# same scale as above
#cpu_operator_cost = 0.0025		# same scale as above
#effective_cache_size = 6000MB

The output of select * from pg_statistics is large...should I attach it as a separate file (not sure if that's allowed on these mailing lists)

The data is ~2.5gb, I can't think of any place I can upload it.  I can provide the columns and data type.  it's a subset of public data from usaspending.gov

column_name,                                       datatype,  ordinal position, nullable?

  idx
  integer
  1
  YES

  obligatedamount
  double precision
  2
  YES

  baseandexercisedoptionsvalue
  double precision
  3
  YES

  baseandalloptionsvalue
  double precision
  4
  YES

  maj_fund_agency_cat
  character varying
  5
  YES

  contractingofficeagencyid
  character varying
  6
  YES

  contractingofficeid
  character varying
  7
  YES

  fundingrequestingagencyid
  character varying
  8
  YES

  fundingrequestingofficeid
  character varying
  9
  YES

  signeddate
  date
  10
  YES

  effectivedate
  date
  11
  YES

  currentcompletiondate
  date
  12
  YES

  ultimatecompletiondate
  date
  13
  YES

  lastdatetoorder
  character varying
  14
  YES

  typeofcontractpricing
  character varying
  15
  YES

  multiyearcontract
  character varying
  16
  YES

  vendorname
  character varying
  17
  YES

  dunsnumber
  character varying
  18
  YES

  parentdunsnumber
  character varying
  19
  YES

  psc_cat
  character varying
  20
  YES

  productorservicecode
  character varying
  21
  YES

  principalnaicscode
  character varying
  22
  YES

  piid
  character varying
  23
  YES

  modnumber
  character varying
  24
  YES

  fiscal_year
  character varying
  25
  YES

  idvpiid
  character varying
  26
  YES

  extentcompeted
  character varying
  27
  YES

  numberofoffersreceived
  double precision
  28
  YES

  competitiveprocedures
  character varying
  29
  YES

  solicitationprocedures
  character varying
  30
  YES

  evaluatedpreference
  character varying
  31
  YES

  firm8aflag
  character varying
  32
  YES

  sdbflag
  character varying
  33
  YES

  issbacertifiedsmalldisadvantagedbusiness
  character varying
  34
  YES

  womenownedflag
  character varying
  35
  YES

  veteranownedflag
  character varying
  36
  YES

  minorityownedbusinessflag
  character varying
  37
  YES

  data_source
  text
  38
  YES

  psc_cd

  character varying
  39
  YES

On Mon, Aug 26, 2013 at 9:40 AM, Tomas Vondra <tv@xxxxxxxx> wrote:

On 26 Srpen 2013, 15:02, Adam Ma'ruf wrote:

> Sure

>

> I just upgraded to 9.2.4.  The query is:

> SELECT        quebec_four

>             , sierra

>             , SUM(dollaramount) as dollaramount

>   FROM alpha_quebec_echo

>   GROUP BY   quebec_four

>              , sierra

>

> alpha_quebec_echo has 5,409,743 rows and 39 columns.  Quebec_four and

> sierra are both varchar, dollar amount is a floating point field.  It has

> no indexes (but neither did the mssql express table).  Any other details

> you need?

>

> Thanks,

> A

Hi,

It's quite clear why the query is so slow - the plan is using on-disk sort

with ~5M rows, and that's consuming a lot of time (almost 120 seconds).

I'm wondering why it chose the sort in the first place. I'd guess it'll

choose hash aggregate, which does not require sorted input.

Can you try running "set enable_sort = false" and then explain of the query?

If that does not change the plan to "HashAggregate" instead of

"GroupAggregate", please check and post values of enable_* and cost_*

variables.

Another question is why it's doing the sort on disk and not in memory. The

explain you've posted shows it requires ~430MB on disk, and in my

experience it usually requires ~3x that much to do the in-memory sort.

I see you've set work_mem=4GB, is that correct? Can you try with a lower

value - say, 1 or 2GB? I'm not sure how this works on Windows, though.

Maybe there's some other limit (and SQL Server is not hitting it, because

it's native Windows application).

Can you prepare a testcase (table structure + data) and post it somewhere?

Or at least the structure, if it's not possible to share the data.

Also, output from "select * from pg_settings" would be helpful.

Tomas

idx	integer	1	YES
obligatedamount	double precision	2	YES
baseandexercisedoptionsvalue	double precision	3	YES
baseandalloptionsvalue	double precision	4	YES
maj_fund_agency_cat	character varying	5	YES
contractingofficeagencyid	character varying	6	YES
contractingofficeid	character varying	7	YES
fundingrequestingagencyid	character varying	8	YES
fundingrequestingofficeid	character varying	9	YES
signeddate	date	10	YES
effectivedate	date	11	YES
currentcompletiondate	date	12	YES
ultimatecompletiondate	date	13	YES
lastdatetoorder	character varying	14	YES
typeofcontractpricing	character varying	15	YES
multiyearcontract	character varying	16	YES
vendorname	character varying	17	YES
dunsnumber	character varying	18	YES
parentdunsnumber	character varying	19	YES
psc_cat	character varying	20	YES
productorservicecode	character varying	21	YES
principalnaicscode	character varying	22	YES
piid	character varying	23	YES
modnumber	character varying	24	YES
fiscal_year	character varying	25	YES
idvpiid	character varying	26	YES
extentcompeted	character varying	27	YES
numberofoffersreceived	double precision	28	YES
competitiveprocedures	character varying	29	YES
solicitationprocedures	character varying	30	YES
evaluatedpreference	character varying	31	YES
firm8aflag	character varying	32	YES
sdbflag	character varying	33	YES
issbacertifiedsmalldisadvantagedbusiness	character varying	34	YES
womenownedflag	character varying	35	YES
veteranownedflag	character varying	36	YES
minorityownedbusinessflag	character varying	37	YES
data_source	text	38	YES
psc_cd	character varying	39	YES