Re: Zero throughput on a query on a very large table.

"ldh@xxxxxxxxxxxxxxxxxx" <ldh@xxxxxxxxxxxxxxxxxx> · Fri, 25 Jan 2019 19:24:30 +0000

Just a correction from my previous message regarding the throughput we get.

On that one table with 1.2B row, the plan through the index scan delivers actually 50K rows/s in read speed to the application, almost immediately. It would go through the entire table in under 7h vs the other approach
 which still didn't deliver any data after 10h.

We do additional joins and logic and out final throughput is about 12K/s (what i quoted previously), but this is a case where clearly the index_scan plan delivers vastly better performance than the table_seq_scan+sort
 plan.

Any insight here?

Thank you,
Laurent.

From: ldh@xxxxxxxxxxxxxxxxxx <ldh@xxxxxxxxxxxxxxxxxx>

Sent: Friday, January 25, 2019 2:06:54 PM

To: Tom Lane

Cc: pgsql-performance@xxxxxxxxxxxxxx

Subject: Re: Zero throughput on a query on a very large table.

OK... I think we may have cracked this.

First, do you think that 128MB work_mem is ok? We have a 64GB machine and expecting fewer than 100 connections. This is really an ETL workload environment at this time.

Second, here is what i found and what messed us up.
    select current_setting('random_page_cost'); --> 4

    alter database "CMS_TMP" set random_page_cost=0.00000001;

    select current_setting('random_page_cost'); --> 4 ????

I also tried:

    select current_setting('random_page_cost'); --> 4
    select set_config('random_page_cost', '0.000001', true);

    select current_setting('random_page_cost'); --> 4 ????

Is there something that is happening that is causing those settings to not stick? I then tried:

    select current_setting('random_page_cost'); --> 4
    select set_config('random_page_cost', '0.000001', false); -- false now, i.e., global

    select current_setting('random_page_cost'); --> 0.000001 !!!!

So i think we just spent 4 days on that issue. I then did

    select set_config('enable_seqscan', 'off', false);

And the plan is now using an index scan, and we are getting 12K rows/s in throughput immediately!!!
😊

So i guess my final question is that i really want to only affect that one query executing, and i seem to not be able to change the settings used by the planner just for that one transaction. I have to change it globally which i would prefer not to do.
 Any help here?

Thanks,
Laurent.

From: ldh@xxxxxxxxxxxxxxxxxx <ldh@xxxxxxxxxxxxxxxxxx>

Sent: Friday, January 25, 2019 1:36:21 PM

To: Tom Lane

Cc: pgsql-performance@xxxxxxxxxxxxxx

Subject: Re: Zero throughput on a query on a very large table.

Sorry :) When i look at the "SQL" tab in PGAdmin when i select the index in the schema browser. But you are right that /d doesn't show that.

From: Tom Lane <tgl@xxxxxxxxxxxxx>

Sent: Friday, January 25, 2019 1:34:01 PM

To: ldh@xxxxxxxxxxxxxxxxxx

Cc: pgsql-performance@xxxxxxxxxxxxxx

Subject: Re: Zero throughput on a query on a very large table.

"ldh@xxxxxxxxxxxxxxxxxx" <ldh@xxxxxxxxxxxxxxxxxx> writes:

> Also, the original statement i implemented did not have all of that. This is the normalized SQL that Postgres now gives when looking at the indices.

[ squint... ]  What do you mean exactly by "Postgres gives that"?

I don't see any redundant COLLATE clauses in e.g. psql \d.

                        regards, tom lane