Re: Version Control?

Russ Brown <pickscrape@xxxxxxxxx> · Fri, 10 Jun 2005 19:34:03 +0100

John DeSoi wrote:

On Jun 10, 2005, at 11:38 AM, Peter Fein wrote:

This would obviously have to be pretty damn clever. Amongs the
difficulties would be ensuring that the patch applies changes in the
correct order (e.g. add column before adding foreign key). It's hard,
but I don't believe it's impossible.

As an uninformed, off-the-wall idea, could one compare snapshots of the
system tables to generate these diffs? I know next-to-nothing about
these, but it seems like they'd contain the info you'd need.

I think it would be a requirement; I'm not convinced it could be done 
just from comparing table definitions. For example, using the table 
definitions only, how could you distinguish between renaming the last 
column in the table versus dropping the column and adding a new one of 
the same type. The target table would be the same, but the semantics of 
getting there are not.

That's an excellent point that I certainly hadn't thought of. Indeed, if 
you look at the output from the general text-based diff programs, they 
only output additions or removals. If you change a line it gets marked 
as two separate changes: a removal and an addition.

My ideal is still to be able to create a 'diff' between any two 
arbitrary revisions of a schema. Therefore, the schema must only contain 
the state of the database, and not transitional information (since the 
transition is undefined before the 'diff' takes place). There would need 
to be some other way to identify the two columns as being the same 
historically. This could be something in the comment (though I don't 
like the idea of enforcing metadata in comments) or using some technique 
such as an OID (though I can't remember if things like columns have 
OIDs, and it's not particularly portable either). Perhaps the problem is 
best served by two files: one containing the schema snapshot itself, and 
the other containing a 'manifest' which lists each entity in the schema 
along with a unique identifier for each entity.

Then you have the problem of how you assign each identifier, since many 
people may be working on the same schema on different branches. Just 
picking the next number wouldn't work as you'd get clashes. It would 
have to be some sort of GUID.

An added benefit of this 'manifest' file is that it would be a good 
place to store additional meta-data on things like whether a given 
table's data should be dumped or not.

(thinks for a while...)

Now that I think about it, why stick with the schema dump at all? We 
could (for example) define a generic XML schema on which the manifest is 
based which describes everything, from the structure, meta data and the 
actual required data itself to a list of tables that should be ignored 
completely. You'd have to provide a script to update the file from a 
database, and when it encounters something like a column rename it could 
ask you if you actually did drop and re-add the column or if it was a 
rename. Depending on the answer you give the column either gets a new 
GUID or keeps the old one.

So taking that example, you do your commit and somebody updates their 
working copy and sees that the database file has changed. They run the 
diff command to see what they need to do to update their version and it 
outputs the DROP COLUMN/ADD COLUMN or RENAME statement that they need, 
along with a couple of configuration inserts that you did too.

This certainly is a far more complex problem than I originally thought 
it was. I'd like any solution to be able to work in any version control 
system, and to be applicable to any database engine (my employer is a 
MySQL house, while I personally prefer Postgres for my personal stuff, 
so I'm in favour of it being completely portable).

Another important design criteria for me would be ease of use. This is a 
version-control stage on top of the existing version control work that a 
developer needs to do, and if it's a real hassle they aren't going to 
bother to do it properly. It therefore needs to be able to do things as 
automatically as possible, for example by connecting to a database and 
generating the manifest file directly.

I also think a requirement would have to be that a database would need a 
dedicated table (or view) to store its current version number. The 
manifest file could specify where to find this. It would be required to 
know which revision you need to 'diff' against to upgrade the database 
to the latest version. For extra brownie points the diff command could 
output the update statement to set the version to the new version 
automatically, so you don't forget to do it.

As I say it's a hard problem, but I certainly don't think it's impossible.

--

Russ.

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend