Re: Index Scan Costs versus Sort

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tom,

From pg_stats:

schema = "tiger";
tablename = "completechain";
attname = "tlid";
null_frac = 0;
avg_width = 4;
n_distinct = -1;
most_common_vals = ;
most_common_freqs = ;
correlation = 0.155914;

Note that I have default_statistics_target set to 100. Here is the first few values from histogram_bounds:

"{102450,2202250,4571797,6365754,8444936,10541593,12485818,14545727,16745594,18421868,20300549,22498643,24114709,26301001,28280632,30370123,32253657,33943046,35898115,37499478,39469054,41868498,43992143,45907830,47826340,49843926,52051798,54409298,56447416,

The tlid column is a US Census bureau ID assigned to each chain in the US - where a chain is a road segment, river segment, railroad segment, etc. The data is loaded on state-by-state basis, and then a county-by-county basis. There is no overall ordering to TLIDs, although perhaps there is some local ordering at the county level (but from a quick look at the data I don't see any, and the correlation factor indicates there isn't any if I am interpreting it correctly).

Any other info that would be helpful to see?


Charlie


Tom Lane wrote:
Charlie Savage <cfis@xxxxxxxxxxxxx> writes:
1. Postgresql estimates the index scan will be 50 times more costly than the seq scan (112870376 vs 2229858) yet in fact it only takes 3 times longer to execute (2312426 s vs. 768403 s). My understanding is that postgresql assumes, via the random_page_cost parameter, that an index scan will take 4 times longer than a sequential scan. So why is the analyzer estimating it is 50 times slower?

The other factors that are likely to affect this are index correlation
and effective cache size.  It's fairly remarkable that a full-table
index scan only takes 3 times longer than a seqscan; you must have both
a high correlation and a reasonably large cache.  You showed us your
effective_cache_size setting, but what's the pg_stats entry for completechain.tlid contain? Can you quantify what the physical
ordering of tlid values is likely to be?

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux