Search Postgresql Archives

share nothing but reads cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, 

in order to face our data growth we want to start evaluate cluster
possibilities.

our application is mainly an ETL and reporting Tool for logs content.
all large tables are timely partitioned, whereas the partition logic is
driven by the application to avoid trigger overhead.

each customer gets its own schema, the largest being approx. 1TB in
size.

One of the main component is a query framework that analyze the client
request and compute the related SELECT statement.
simple SELECTs have such a general form (but there also are some much
more tricky cases) :

SELECT *mainAggregate*
FROM
(
   (select *subAggregate* from table1 where ...)
   UNION ALL
   (select *subAggregate* from table2 where ...)
   UNION ALL
   (select *subAggregate* from table3 where ...)
   ...
)
WHERE ...

   
   
One idea is to split and process the logs on different servers and just
add the corresponding address 
within the partition list on the "master".

The query framework should then be able to distribute the sub queries on
the remote server accordingly.


A quick test with db_link shows a severe drop down in performance while
putting the data together:

select count(*) from 
(
select * from dblink('a','select * from test_cluster')as t1(a int)
union all 
select * from dblink('a','select * from test_cluster')as t1(a int)
union all 
select * from dblink('a','select * from test_cluster')as t1(a int)
)foo

is about 5 times slower than an equivalent query run locally.

working with asynchron. queries (dblink_send_query) does not bring much
benefit 
so that much time seems to be spent for transfer and merge.


I've also taken a quick look at the SQL/MED and PL/Proxy documentation,
but as far as I've seen, 
these features expect predefined column definition and do not allow to
push arbitrary SQLs statement,  
especially the aggregate part that could take advantage of the foreign
resources.
Running the foreign sub queries concurrently would also be a fine thing.


Are there other way we should evaluate ?
Should we better wait foir POstgres 9.2+ ?

best regards,

Marc Mamin


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux