On Thu, Feb 05, 2009 at 04:38:25PM -0500, tedd wrote: > Hi gang: > > To further the Garbage Collection thread on to another level (i.e., > preserving history) please consider this: > > Okay, let's say we have a table containing all the instances of > tutors teaching courses. A record might look like this: > > Course-to-Tutor table > course_id = 123 > tutor_id = 7 > date_active = 2/4/9 > date_inactive = null > > The above record was assembled from the following recrods: > > Tutor table: > id = 7 > name = joe > > Course table: > id = 123 > title = Introduction to Moonshine > > Okay, so let's now delete the course "Introduction to Moonshine"! > > At the first opportunity to search the Course-to-Tutor table we find > that the course 123 is now absent from our database and as such we > set the date_inactive field -- it won't be considered again. > > Okay, so what does that tell us about the history? It only provides > that at one time joe taught an undefined course. Furthermore, if joe > quits, then we only have a record that says someone with an id of 7 > taught a course with an id of 123 -- that doesn't make much sense, > does it? No. If you're concerned about history, leave the course and tutor records in place. Simply change record in the "Course-to-Tutor" table. Set the date_inactive field to today's date or whatever. If you're concerned about history, you never delete the records that contribute to a link table. You can delete the record that binds them together in the link table, but not if you want to preserve history. > > Now, if we reconsider the Course-to-Tutor table and instead of > putting in the tutor_id and course_id, we put in the actual name of > the tutor and title of the class, then at least we have a history > that makes sense. Don't do this. Your searches and indexes will take longer and it will complicate things. Maintain your primary keys as integers. If you want to know the name of the guy who taught course ID 123 from this date to that date, just issue a query which joins the tables. As long as your data is somewhere in these tables (in *one* place, please), you can join tables in a query and get any info you want. > > However by doing that, we violate the primary principle of a > relational database by creating redundant data. While this is true in the ideal world of Codd, in the real world, there is often some minor duplication of data. But in your case, it isn't needed. > > I guess that one can say that if we never actually delete the tutor > nor the course, but instead just make them inactive as well, then we > have solved the problem -- is that the answer? > > In any event, what problems do you foresee if I used the actual names > and titles for the Course-to-Tutor table instead of their id's? After > all, the instance records would retain a simple history of what > happened regardless of deletions AND a search for "Introduction to > Moonshine" should be essentially as fast as a search for "123" if > both fields are indexed, am I right or wrong? Not necessarily. In fact, if you add on to this schema, you'll ultimately find that IDs are vastly easier to manage. They make the queries more complex, but the tables are easier to manage. Trust me on this. I've had to go back and redesign tables where I used a long string like a name for the index, and it's generally a bad idea. I expect most DBAs and programmers will agree. Oh, here's another *excellent* reason not to use names: fat fingers. If someone misspells a name or a course title, you can change it to your heart's content if it's just a field in a table. But if it's a key that binds tables together, you can't change it without cascading problems. That key is now in *at least* two places and must be changed *everywhere*, and the DBMS normally won't let you do that. You could add "cascade update" provisions into your tables, but why? Just use an integer key, and you're away. Paul -- Paul M. Foster -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php