John DeSoi wrote:
On Jun 10, 2005, at 11:38 AM, Peter Fein wrote:
This would obviously have to be pretty damn clever. Amongs the
difficulties would be ensuring that the patch applies changes in the
correct order (e.g. add column before adding foreign key). It's hard,
but I don't believe it's impossible.
As an uninformed, off-the-wall idea, could one compare snapshots of the
system tables to generate these diffs? I know next-to-nothing about
these, but it seems like they'd contain the info you'd need.
I think it would be a requirement; I'm not convinced it could be done
just from comparing table definitions. For example, using the table
definitions only, how could you distinguish between renaming the last
column in the table versus dropping the column and adding a new one of
the same type. The target table would be the same, but the semantics of
getting there are not.
That's an excellent point that I certainly hadn't thought of. Indeed, if
you look at the output from the general text-based diff programs, they
only output additions or removals. If you change a line it gets marked
as two separate changes: a removal and an addition.
My ideal is still to be able to create a 'diff' between any two
arbitrary revisions of a schema. Therefore, the schema must only contain
the state of the database, and not transitional information (since the
transition is undefined before the 'diff' takes place). There would need
to be some other way to identify the two columns as being the same
historically. This could be something in the comment (though I don't
like the idea of enforcing metadata in comments) or using some technique
such as an OID (though I can't remember if things like columns have
OIDs, and it's not particularly portable either). Perhaps the problem is
best served by two files: one containing the schema snapshot itself, and
the other containing a 'manifest' which lists each entity in the schema
along with a unique identifier for each entity.
Then you have the problem of how you assign each identifier, since many
people may be working on the same schema on different branches. Just
picking the next number wouldn't work as you'd get clashes. It would
have to be some sort of GUID.
An added benefit of this 'manifest' file is that it would be a good
place to store additional meta-data on things like whether a given
table's data should be dumped or not.
(thinks for a while...)
Now that I think about it, why stick with the schema dump at all? We
could (for example) define a generic XML schema on which the manifest is
based which describes everything, from the structure, meta data and the
actual required data itself to a list of tables that should be ignored
completely. You'd have to provide a script to update the file from a
database, and when it encounters something like a column rename it could
ask you if you actually did drop and re-add the column or if it was a
rename. Depending on the answer you give the column either gets a new
GUID or keeps the old one.
So taking that example, you do your commit and somebody updates their
working copy and sees that the database file has changed. They run the
diff command to see what they need to do to update their version and it
outputs the DROP COLUMN/ADD COLUMN or RENAME statement that they need,
along with a couple of configuration inserts that you did too.
This certainly is a far more complex problem than I originally thought
it was. I'd like any solution to be able to work in any version control
system, and to be applicable to any database engine (my employer is a
MySQL house, while I personally prefer Postgres for my personal stuff,
so I'm in favour of it being completely portable).
Another important design criteria for me would be ease of use. This is a
version-control stage on top of the existing version control work that a
developer needs to do, and if it's a real hassle they aren't going to
bother to do it properly. It therefore needs to be able to do things as
automatically as possible, for example by connecting to a database and
generating the manifest file directly.
I also think a requirement would have to be that a database would need a
dedicated table (or view) to store its current version number. The
manifest file could specify where to find this. It would be required to
know which revision you need to 'diff' against to upgrade the database
to the latest version. For extra brownie points the diff command could
output the update statement to set the version to the new version
automatically, so you don't forget to do it.
As I say it's a hard problem, but I certainly don't think it's impossible.
--
Russ.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend