I have a bunch of test/development databases which we currently refresh with production data as-needed using a NetApp filer's snapshot capabilities - we have a production slave with its datadir on a filer mount (NFS), and once a night (via cron) we shutdown
the slave, snapshot the filer volume, and then start the database back up. When we need to do data refresh in a test/dev environment, we stop postgres there, clone the NetApp snapshot, and use that clone as the test/dev environment data directory. This is
wonderful from a storage point of view, as the clones only store changed blocks on the filer. We have a ~1.5TB datadir, but these clones are only a few MB each, because there's little changed data. (We're running postgres 9.0.13, but if what I'm about to ask
is "more possible" with a newer version, that's a vague possibility)
Up until now, each test/dev environment has had its own postgres server. That's pretty bad on resources, since they're largely idle most of the time.
Now, we have to spin up somewhere around 100-150 of these environments. The NetApp is about the only way we can do it, because we simply don't have 150-225TB of disk to spare. It would also be a real pain (and inefficient) to run 100-150 separate machines,
each running a single instance of postgres.
What I'd like to do is take a disk/filer snapshot of a stopped database (i.e. the actual files on disk, not a pg_dump) on one postgres instance (one physical server) and restore it on a different one. Ideally the database would have a different name, but
that's flexible. Even more ideally this would all happen without a restart of the destination postgres instance, but I suppose we can work around that too.
Is this even possible? Anyone have experience with, essentially, creating a tablespace that points to an
existing data directory? Or, alternatively, swapping out the data directory of one tablespace with that of another?
Any advice or suggestions would be greatly appreciated.
I apologize if there's something obvious in the documentation that I missed, but I'm not much of a database guru, and am relatively new to pgsql in general.
Jason Antman
PS - The slightly-less-abstract explanation:
We're spinning up nearly 100 (maybe 150) new test environments. We simply don't have the ~200T of disk to maintain a separate DB for each of them, so using the NetApp filer and letting it do thin clones is an absolute requirement. We also need to be able to
do quick restores to the latest daily production snapshot. Quick as in, the developers and testers run a script in their test environment that does the restore. We're currently doing this for ~50 environments, and we run a separate VM with postgres for each
one, so it's relatively straightforward - stop postgres, unmount the datadir, do the filer magic to refresh it with a clone of this morning's production DB, then re-mount the new filer volume (clone) and start postgres. Unfortunately, the massive overhead
of running a separate virtual machine with a separate postgres instance for every DB is more than we can handle when scaling 2-3x. In addition to that, the resource waste is awful (a dedicated VM running a dedicated postgres instance for each DB, most of which
are idle about 14 hours/day). So, we'd like to run multiple databases in separate tablespaces, but on the same host and the same postgres instance, so N databases can use the same shared memory, etc.
The developers and testers who use these instances need to be able to "get fresh data" as often as need (sometimes 3x/day). Ideally we'd be able to refresh one of these filer volumes without stopping postgres. But it's also an acceptable alternative to build
out, say, 3x the number of DBs we need, and refresh all of the not-currently-in-use ones on a schedule every night, during a maintenance/downtime window.