Re: Schema version control

Daniel Popowich <danielpopowich@xxxxxxxxx> · Fri, 11 Feb 2011 08:35:58 -0500

Glenn Maynard writes:
> - Adding a NOT NULL constraint (without adding a DEFAULT).  You often want
> to precede this with filling in any existing NULL values, so the new
> constraint doesn't fail.
> - Updating triggers, functions and their effects.  For example, when I have
> an FTS index with a trigger to update an index column, and I change the
> underlying trigger, I often do something like "UPDATE table SET column =
> column", to cause all of the update triggers to fire and recalculate the
> index columns.
> - Creating a new column based on an old one, and removing the old one; eg.
> add a column "n", run "UPDATE ... SET n = i*j * 2", and then drop the old
> columns "i" and "j".
> - Updating data from an external source, such as ORM model code; for
> example, if you have a table representing external files, an update may want
> to calculate and update the SHA-1 of each file.
> - For efficiency, dropping a specific index while making a large update, and
> then recreating the index.
> 
> In my experience, while generating schema updates automatically is handy, it
> tends to make nontrivial database updates more complicated.  These sorts of
> things happen often and are an integral part of a database update, so I'm
> just curious how/if you deal with them.
> 
> I've used Ruby's migrations, and for my Django databases I use my own
> migration system which is based in principle off of it: create scripts to
> migrate the database from version X to X+1 and X-1, and upgrade or downgrade
> by running the appropriate scripts in sequence.
> 
> It's not ideal, since it can't generate a database at a specific version
> directly; it always has to run through the entire sequence of migrations to
> the version you want, and the migrations accumulate.  However, it can handle
> whatever arbitrary steps are needed to update a database, and I don't need
> to test updates from every version to every other version.

I'm with Glenn on this point.  I have found updating a db version to
be far more complex (for reasons he illustrates and which I find more
common than not) than any automation tool can handle.  And I wonder if
the time spent developing such a tool (or writing changesets, xml,
etc. for a given tool) actually saves development time.

FWIW, this is what I do:

 1.  I have a table in my database, meta, that contains exactly one
 row, and holds configuration information.  A minimal version of this
 table:

    CREATE TABLE meta (
	id integer DEFAULT 1 NOT NULL CHECK (id = 1),
	major integer NOT NULL,
	minor integer NOT NULL,
	patch integer NOT NULL
    );

  The first column is to guarantee I only have one row.  The next
  three provide a tuple for my version, e.g., 1.1.3.  I add other
  columns for software configuration as needed.

 2.  My application configuration has a DB_VERSION variable which
 defines the version of the database the software depends on.  If
 there's a mismatch, my programs can't connect to the database, thus
 guaranteeing db<=>software happiness.

 3.  Like Glenn, I have a script that can go forwards and backwards,
 one revision at a time.  The effort in maintaining this script is
 minimal, actually: in my RCS, I have the current schema, which is
 generated with: `pg_dump -O -s`.  Based on diffs between current
 system and last revision I can manually generate the necessary DDL
 statements for the script.  If I have data as part of my "schema,"
 like the row in my meta table, those are simple enough to examine and
 add to the script with DML statements.

 4.  Whenever a revision changes the schema, I tag my my repository,
 so I can easily check out that version.

 5.  As for branches.  If production is at 1.1.3 and three branches
 create versions 1.1.4, another 1.1.4 and 1.2.0, and all three want to
 merge their various changes back into a 1.1.3 production database?  I
 say, without being flip, don't let this happen.  Here is where I
 think no software process can make anyone happy.  It's a human
 process: declare someone the owner of the database schema, let them
 own the long term development of the schema, and if anyone needs a
 change, they have to communicate and vet those changes with the db
 designer.

Dan

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general