Re: How to specify/mock the statistic data of tables in PostgreSQL

Felix.徐 <ygnhzeus@xxxxxxxxx> · Mon, 13 Jan 2014 14:51:57 +0800

I see, thanks.
I'm looking into the source code of statistic part now, and I'm a little confused about the column "staop" presented in table pg_statistic, 
in the pg_statisitc.h, the comment says:

	/* ----------------
	 * To allow keeping statistics on different kinds of datatypes,

	 * we do not hard-wire any particular meaning for the remaining
	 * statistical fields.	Instead, we provide several "slots" in which
	 * statistical data can be placed.	Each slot includes:
	 *		kind			integer code identifying kind of data (see below)
	 *		op				OID of associated operator, if needed
	 *		numbers			float4 array (for statistical values)
	 *		values			anyarray (for representations of data values)
	 * The ID and operator fields are never NULL; they are zeroes in an
	 * unused slot.  The numbers and values fields are NULL in an unused
	 * slot, and might also be NULL in a used slot if the slot kind has
	 * no need for one or the other.
	 * ----------------
	 */
And,
//line 194 : In a "most common values" slot, staop is the OID of the "=" operator used to decide whether values are the same or not.
//line 206 : A "histogram" slot describes the distribution of scalar data.  staop is the OID of the "<" operator that describes the sort ordering.
....

I don't understand the function of staop here, how is it used in optimizer, is there any example ? thanks!

2014/1/10 Amit Langote <amitlangote09@xxxxxxxxx>

On Fri, Jan 10, 2014 at 11:19 PM, Atri Sharma <atri.jiit@xxxxxxxxx> wrote:

>

>

> Sent from my iPad

>

> On 10-Jan-2014, at 19:42, "ygnhzeus" <ygnhzeus@xxxxxxxxx> wrote:

>

> Thanks for your reply.

> So correlation is not related to the calculation of selectivity right? If I

> force PostgreSQL not to optimize the join order (by setting

> join_collapse_limit and from_collapse_limit  to 1) , is there any other

> factor that may affect the structure of execution plan regardless of the

> data access method.

>

> 2014-01-10

> ________________________________

> ygnhzeus

> ________________________________

> 发件人：Amit Langote <amitlangote09@xxxxxxxxx>

> 发送时间：2014-01-10 22:00

> 主题：Re:  How to specify/mock the statistic data of tables in

> PostgreSQL

> 收件人："ygnhzeus"<ygnhzeus@xxxxxxxxx>

> 抄送："pgsql-general"<pgsql-general@xxxxxxxxxxxxxx>

>

>

>

> AFAIK, correlation is involved in calculation of the costs that are used for

> deciding the type of access.If the correlation is low, index scan can lead

> to quite some random reads, hence leading to higher costs.

>

Ah, I forgot to mention this point about how planner uses correlation

for access method selection.

And selectivity is a function of statistical distribution of column

values described in pg_statistic by histograms, most common values

(with their occurrence frequencies), number of distinct values, etc.

It has nothing to do with correlation.

--

Amit Langote