* Igor Chudov (ichudov@xxxxxxxxx) wrote: > Can I partition data by month (or quarter), without that month being part of > PRIMARY KEY? The way partitioning works in PG is by using CHECK constraints. Not sure if you're familiar with those (not sure if MySQL has them), so here's a quick example: Create a parent table. Then create two tables which inherit from that parent table (this is more of an implementation detail than anything else, the parent table is always empty, it's just there to be the single, combined, table that you run your select queries against). On each of the two 'child' tables, create a CHECK constraint. On table1, you do: alter table table1 add check (date < '2000-01-01'); On table2, you do: alter table table2 add check (date >= '2000-01-01'); Once those are done, you can query against the 'parent' table with something like: select * from parent where date = '2010-01-01'; And PG will realize it only has to look at table2 to get the results for that query. This means the partitioning can be more-or-less any check constraint that will be satisfied by the data in the table (and PG will check/enforce this) and that PG can figure out will eliminate a partition from possibly having the data that matches the request. Technically, this means that you could have all kinds of different ways your data is split across the partitions, but remember that all the constraints have to actually be TRUE. :) Typically, people do split based on the PK, but it's not required (realize that PG doesn't support cross-table PKs, so if you don't have CHECK constraints which make sure that the tables don't cover the same PK value, you could end up with duplicate values across the tables...). > If this question sounds weird, I am asking because MySQL enforces this, > which does not fit my data. That part is a little strange.. > If I can keep my primary key to be the ID that I want (which comes with > data), but still partition it by month, I will be EXTREMELY happy. As I said above, the actual PK is going to be independent and in the base/child tables. That said, yes, you could have the PK in each table be whatever you want and you use month to partition the 'main' table. You then have to come up with some other way to make sure your PK is enforced, however, or figure out a way to deal with things if it's not. Based on what you've been describing, I'm afraid you'd have to actually search all the partitions for a given ID on an update, to figure out if you're doing an UPDATE or an INSERT... Unless, of course, the month is included in the PK somewhere, or is in the incoming data and you can be 100% confident that the incoming data is never wrong.. :) > I am wondering, why? Cost, and we had a real hard time (this was a while ago..) getting Oracle to run decently on Linux, and the Sun gear was just too damn expensive. Also, ease of maintenance- it takes a LOT less effort to keep a PG database set up and running smoothly than an Oracle one, imv. > I could, say, set work_mem to 30 GB? (64 bit linux) You can, but you have to be careful with it, because PG will think it can use 30GB for EACH sort in a given query, and in EACH hash in a given query. What I would recommend is setting the default to something like 256MB and then looking at specific queries and bumping it up for those queries when it's clear that it'll help the query and won't cause the system to go into swap. Note that you can set work_mem for a given session after you connect to the database, just do: set work_mem = '1GB'; in your session before running other queries. Doing that won't impact other sessions. Thanks, Stephen
Attachment:
signature.asc
Description: Digital signature