Search Postgresql Archives

Re: database split

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/29/15 6:04 PM, Dave Owens wrote:

A. Text-processing a dump of the original database, filtering only the
rows in which I am interested, while loading the dump into the new database.

B. Copying the original database (CREATE DATABASE smalldb WITH TEMPLATE
bigdb), then modifying the schema so that each FK constraint now
includes ON DELETE CASCADE, then deleting rows from the few core tables
while letting postgresql cascade the deletes down the hierarchy.  Then,
of course, restoring the schema to its original state.

This would be quite slow, but will work.

C. Generating the myriad SELECTs necessary to export only the required
data from every table, then importing those results (either using dblink
or COPY FROMs) into the new database that has been populated with schema
from the original database.  Carefully ordering the imports to avoid
missing FKs.

D. Combine A and B. FKs are added AFTER data is loaded in a SQL dump. So after the COPY commands are done you could run relevant DELETES. Brute force, but it'd do the job and be faster than cascade deletes.

E. A spin on C would be to put triggers on the tables to silently drop data that you don't want.

All that said, my guess is you're doing this to support horizontal scale-out, which means you'll probably need to do this more than once, and it'd presumably be nice for you and your customers if this didn't require downtime. I would look at having a way to create a partial replica using londiste (or BDR if it'd support it). The trick there is having a way to identify whether you want each individual row on a replica. If you add some kind of cluster_id field to every table that makes doing that filtering pretty easy; IIRC londiste supports that out of the box.

The other option is having functions that would let you determine whether you want a row. I know you could get londiste to do this, but you'd have to implement some of the filtering in python (Skype actually does, or at least did this).

One more thing to consider: You can scale PG to some serious database sizes if you're willing to spend money on hardware. Given what kind of a server you can get for $20k now-a-days, I'd seriously investigate that if you haven't already.

BTW, there's also https://github.com/citusdata/pg_shard. I've never really looked at it, but it might make all of this a lot easier.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux