Search Postgresql Archives

Performance problem with low correlation data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have a problem with the method that PG uses to access my data.

Data into testinsert is inserted every 15 minutes.

ne_id varies from 1 to 20000.


CREATE TABLE testinsert
(
  ne_id integer NOT NULL,
  t timestamp without time zone NOT NULL,
  v integer[],
  CONSTRAINT testinsert_pk PRIMARY KEY (ne_id, t)
)

CREATE UNIQUE INDEX testinsert_time_key
  ON testinsert
  USING btree
  (t, ne_id);

This table has, then, a t correlation of 1, and a ne_id correlation close to 0.

I query this table using another table:

CREATE TABLE idtable
(
  id integer NOT NULL,
  groupname varchar(50)
  
  CONSTRAINT idtable_pk PRIMARY KEY (id, groupname)
)

CREATE INDEX idtable_group_idx
  ON idtable
  USING btree
  (groupname);


where each id is associated with a group:

select * from idtable left outer join testinsert on id=ne_id where groupname='a group name' and time between $a_date and $another_date


PG usually choose a nested loop join over all the ne_ids found for groupname='a group name'.
BUT, given the correlation in the table, this is a horrible data access: the table (15GB) gets read randomly, since data for one ne_id is scattered all over the table;

The "best" way to read the table would still be a nested loop, but a loop on the "t" values, not on the ne_id values, since data for the same timestamp is "close".

Or, even better, something like this would be very nice:

Bitmap Heap Scan
	for each id found in idtable where groupname='a group name'
		BitmapOr
			BitmapIndexScan using ne_id and time between $a_date and $another_date


That is: I understand why PG is using that access method to fetch the indexes, but I would like it to fetch the heaps only after ALL the indexes have been read, so that it could reorder them...

So, given that:


How can I tell to PG to use an algorithm such as:

fetch the heap
	for each quarter
		for each id found where groupname='a group name'
			fetch all the indexes
	
instead of:

for each id found where groupname='a group name'
	fetch the heap
		fetch all the indexes where ne_id=id  time between $a_date and $another_date



????


(

some other infos:

1) results clustering the table are x10-x20 faster, but I can't cluster the table (it gets written every 15 minutes and read pretty often)
2) I know all the "t" values that I'm going to query, since there won't be more than 1 t per ne_id per 15 minutes;
so I could use a generate_series($a_date, $another_date, 15 minutes) if that could help somehow:

select * from 
idtable 
cross join generate_series($a_date, $another_date, 15 minutes) as myt
left outer join testinsert on id=ne_id and myt=t
where groupname='a group name'

but it doesn't help...
)





-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux