Re: Transparent table partitioning in future version of PG?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 8 May 2009, Robert Haas wrote:

On Thu, May 7, 2009 at 10:52 PM,  <david@xxxxxxx> wrote:
Hopefully, notions of partitioning won't be directly tied to chunking of
data for parallel query access. Most queries access recent data and
hence only a single partition (or stripe), so partitioning and
parallelism and frequently exactly orthogonal.

Yes, I think those things are unrelated.

I'm not so sure (warning, I am relativly inexperianced in this area)

it sounds like you can take two basic approaches to partition a database

1. The Isolation Plan
[...]
2. The Load Balancing Plan

Well, even if the table is not partitioned at all, I don't see that it
should preclude parallel query access.  If I've got a 1 GB table that
needs to be sequentially scanned for rows meeting some restriction
clause, and I have two CPUs and plenty of I/O bandwidth, ISTM it
should be possible to have them each scan half of the table and
combine the results.  Now, this is not easy and there are probably
substantial planner and executor changes required to make it work, but
I don't know that it would be particularly easier if I had two 500 MB
partitions instead of a single 1 GB table.

IOW, I don't think you should need to partition if all you want is
load balancing.  Partitioning should be for isolation, and load
balancing should happen when appropriate, whether there is
partitioning involved or not.

actually, I will contridict myself slightly.

with the Isolation Plan there is not nessasarily a need to run the query on each parition in parallel.

if parallel queries are possible, it will benifit Isolation Plan paritioning, but the biggest win with this plan is just reducing the number of paritions that need to be queried.

with the Load Balancing Plan there is no benifit in partitioning unless you have the ability to run queries on each parition in parallel


using a seperate back-end process to do a query on a seperate partition is a fairly straightforward, but not trivial thing to do (there are complications in merging the result sets, including the need to be able to do part of a query, merge the results, then use those results for the next step in the query)

I would also note that there does not seem to be a huge conceptual difference between doing these parallel queries on one computer and shipping the queries off to other computers.


however, trying to split the work on a single table runs into all sorts of 'interesting' issues with things needing to be shared between the multiple processes (they both need to use the same indexes, for example)

so I think that it is much easier for the database engine to efficiantly search two 500G tables instead of one 1T table.

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux