Sorry, the web outlook client may be "prepending" this message instead of appending, as is the custom on this mailing list.
The indices are defined as: CREATE INDEX i_outprev_ptclaim
ON public.tmp_outpatient_rev USING btree (desy_sort_key COLLATE pg_catalog."default", claim_no COLLATE pg_catalog."default") TABLESPACE pg_default; CREATE UNIQUE INDEX ui_outprev_ptclaimline
ON public.tmp_outpatient_rev USING btree (desy_sort_key COLLATE pg_catalog."default", claim_no COLLATE pg_catalog."default", clm_line_num COLLATE pg_catalog."default") TABLESPACE pg_default; I am using PGAdmin4 and the client times out, so i don't have the exact timing, but each one of those indices completed under 5h (started at lunch time and was done before the end of the afternoon). So when i ran the query and it didn't move for about 10h, i figured it might "never end" :).
I'll try changing the random page cost and see. The work_men param is set to 128MB... So maybe that's something too? I'll try.
Additionally, do note that we have a second table, similar in structure, with 180M rows,
select pg_size_pretty(pg_relation_size('tmp_inpatient_rev')) --> 18GB (so it's 10x smaller) but we get 40K rows/s read throughput on that with a similar query and index and the plan does chose an index scan and returns the first thousands of row almost
immediately (a few secs).
From: Tom Lane <tgl@xxxxxxxxxxxxx>
Sent: Friday, January 25, 2019 1:24:45 AM To: ldh@xxxxxxxxxxxxxxxxxx Cc: pgsql-performance@xxxxxxxxxxxxxx Subject: Re: Zero throughput on a query on a very large table. "ldh@xxxxxxxxxxxxxxxxxx" <ldh@xxxxxxxxxxxxxxxxxx> writes:
> Query: > select * from tmp_outpatient_rev order by desy_sort_key, claim_no > Plan: > [ seqscan-and-sort ... parallelized, but still seqscan-and-sort ] > - I have tried to hack the planner to force an index scan (which would avoid the sort/gather steps and should start streaming data right away), in particular, enable_seqscan=false or seq_page_cost=2. This had ZERO impact on the plan to my surprise. If you can't get an indexscan plan despite setting enable_seqscan=false, that typically means that the planner thinks the index's sort order does not match what the query is asking for. I wonder whether you created the index with nondefault collation, or asc/desc ordering, or something like that. There's not enough detail here to diagnose that. It should also be noted that what enable_seqscan=false actually does is to add a cost penalty of 1e10 to seqscan plans. It's possible that your table is so large and badly ordered that the estimated cost differential between seqscan and indexscan is more than 1e10, so that the planner goes for the seqscan anyway. You could probably overcome that by aggressively decreasing random_page_cost (and by "aggressive" I don't mean 2, I mean 0.2, or maybe 0.00002, whatever it takes). However, if that's what's happening, I'm worried that getting what you asked for may not really be the outcome you wanted. Just because you start to see some data streaming to your app right away doesn't mean the process is going to complete in less time than it would if you waited for the sort to happen. You didn't mention what you have work_mem set to, but a small value of that would handicap the sort-based plan a lot. I wonder whether jacking up work_mem to help the sorts run faster won't end up being the better idea in the end. regards, tom lane PS: On the third hand, you mention having created new indexes on this table with apparently not a lot of pain, which is a tad surprising if you don't have the patience to wait for a sort to finish. How long did those index builds take? |