Re: Recommendations for partitioning?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for the delay response. We had some hardware/configuration issues that appear to be solved now, so now we're starting to actually play with modifying the database.

On Sat, Dec 7, 2013 at 1:29 PM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Thu, Dec 5, 2013 at 7:36 AM, Dave Johansen <davejohansen@xxxxxxxxx> wrote:
I'm managing a database that is adding about 10-20M records per day to a table and time is a core part of most queries,


What is the nature of how the time column is used in the queries?   Depending on how it is used, you might not get much improvement at all, or you might get N fold improvement, or you might find that re-designing your indexes could get you the same query improvement that partitioning would, but with less headache.

The time column is usually used to calculate statistics, find/analyze duplicates, analyze data contents, etc on a specific time window. So there will be queries with GROUP BY and WINDOWs with a specific time filter in the where clause.
 

so I've been looking into seeing if I need to start using partitioning based on the time column and I've found these general guidelines:

Don't use more than about 50 paritions ( http://www.postgresql.org/message-id/17147.1271696670@xxxxxxxxxxxxx )

Using triggers slows INSERTs down by a lot (unless they were already slow due to the need to hit disk to maintain the indexes or something like that).  Are you sure you can handle that slow down, given your insertion rate?  You could get the best of both worlds by having your bulk loaders target the correct partition directly, but also have the triggers on the parent table for any programs that don't get the message.

Inserting directly into the correct partition whenever possible and leaving the trigger on the parent table seems like the best option.
 

The only data I found fell inline with what you'd expect (i.e. speeds up selects but slows down inserts/updates http://www.if-not-true-then-false.com/2009/performance-testing-between-partitioned-and-non-partitioned-postgresql-tables-part-3/ )


One of the big benefits of partitioning can be to speed up insertions tremendously, by keeping the hot part of the indices that need to be maintained upon insertion together in shared_buffers.

We insert lots of new data, but rarely modify existing data once it's in the database, so it sounds like this would be a big benefit for us.
 
 
So I was thinking that partitioning based on month to keep the number of partitions low, so that would mean about 0.5G records in each table. Does that seem like a reasonable number of records in each partition? Is there anything else that I should consider or be aware of?

How will data be expired?  Hows does the size of one of your intended partitions compare to your RAM and shared_buffers.

We add about 10-20 million records per day with each being about 200 bytes in size (there's a bytea in there with that being the average size) to each table and there's 64 GB of RAM on the machine.
 
 
Cheers,

Jeff

On Sat, Dec 7, 2013 at 1:37 PM, Scott Marlowe <scott.marlowe@xxxxxxxxx> wrote:
> I'll add that you can use assymetric partitioning if you tend to do a
> lot of more fine grained queries on recent data and more big roll up
> on older ones. I.e. partition by month except for the last 30 days, do
> it by day etc. Then at the end of the month roll all the days into a
> month partition and delete them.

This sounds like a great solution for us. Is there some trick to roll the records from one partition to another? Or is the only way just a SELECT INTO followed by a DELETE?

Thanks,
Dave

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux