2009/2/5 tedd <tedd@xxxxxxxxxxxx>: > To further the Garbage Collection thread on to another level (i.e., > preserving history) please consider this: > > Okay, let's say we have a table containing all the instances of tutors > teaching courses. A record might look like this: > > Course-to-Tutor table > course_id = 123 > tutor_id = 7 > date_active = 2/4/9 > date_inactive = null > > The above record was assembled from the following recrods: > > Tutor table: > id = 7 > name = joe > > Course table: > id = 123 > title = Introduction to Moonshine > > Okay, so let's now delete the course "Introduction to Moonshine"! > > At the first opportunity to search the Course-to-Tutor table we find that > the course 123 is now absent from our database and as such we set the > date_inactive field -- it won't be considered again. > > Okay, so what does that tell us about the history? It only provides that at > one time joe taught an undefined course. Furthermore, if joe quits, then we > only have a record that says someone with an id of 7 taught a course with an > id of 123 -- that doesn't make much sense, does it? > > Now, if we reconsider the Course-to-Tutor table and instead of putting in > the tutor_id and course_id, we put in the actual name of the tutor and title > of the class, then at least we have a history that makes sense. > > However by doing that, we violate the primary principle of a relational > database by creating redundant data. > > I guess that one can say that if we never actually delete the tutor nor the > course, but instead just make them inactive as well, then we have solved the > problem -- is that the answer? This is the way I would approach it. Once you've used an ID in a database you should never reuse it so long as there are references to it in other tables. If you need to exclude it from use you need a flag to indicate that. > In any event, what problems do you foresee if I used the actual names and > titles for the Course-to-Tutor table instead of their id's? After all, the > instance records would retain a simple history of what happened regardless > of deletions AND a search for "Introduction to Moonshine" should be > essentially as fast as a search for "123" if both fields are indexed, am I > right or wrong? If you need to change any of the names, e.g. to fix a typo, it's potentially a massive update, whereas it's a change to a single row if you use IDs. Another thing to bear in mind is the size of your indexes. IDs are small, strings are not. Having said that it depends on the DB server as some are better at handling string indexes than others. Depending on exactly what your history requirements are it may be worth exploring other options to maintaining a history. One of the sites I work on maintains an audit log such that if something gets deleted that gets recorded in a semi-structured but fairly generic way so if I need to know when record x was deleted from table y and what record x contained I can craft a SQL query to get that from the audit log table, albeit fairly slowly. I also records inserts and updates to create a complete record. -Stuart -- http://stut.net/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php