Distributed/Parallel Computing

Viji V Nair <viji@xxxxxxxxxxxxxxxxx> · Tue, 6 Oct 2009 00:41:07 +0530

Hi Team,

This question may have asked many times previously also, but I could not find a solution for this in any post. any help on the following will be greatly appreciated.

We have a PG DB with PostGIS functions. There are around 100 tables in the DB and almost all the tables contains 1 million records, around 5 table contains more than 20 million records. The total DB size is 40GB running on a 16GB, 2 x XEON 5420, RAID6, RHEL5 64bit machines, the questions is

1. The geometry calculations which we does are very complex and it is taking a very long time to complete. We have optimised PG config to the best, now we need a mechanism to distribute these queries to multiple boxes. What is best recommended way for this distributed/parallel deployment. We have tried PGPOOL II, but the performance is not satisfactory. Going for a try with GridSQL

2. How we can distribute/split these large tables to multiple disks of different nodes?

Thanks in advance

Viji