Re: Preserving History

Stuart <stuttle@xxxxxxxxx> · Thu, 5 Feb 2009 21:54:45 +0000

2009/2/5 tedd <tedd@xxxxxxxxxxxx>:
> To further the Garbage Collection thread on to another level (i.e.,
> preserving history) please consider this:
>
> Okay, let's say we have a table containing all the instances of tutors
> teaching courses. A record might look like this:
>
> Course-to-Tutor table
> course_id = 123
> tutor_id = 7
> date_active = 2/4/9
> date_inactive = null
>
> The above record was assembled from the following recrods:
>
> Tutor table:
> id = 7
> name = joe
>
> Course table:
> id = 123
> title = Introduction to Moonshine
>
> Okay, so let's now delete the course "Introduction to Moonshine"!
>
> At the first opportunity to search the Course-to-Tutor table we find that
> the course 123 is now absent from our database and as such we set the
> date_inactive field -- it won't be considered again.
>
> Okay, so what does that tell us about the history? It only provides that at
> one time joe taught an undefined course. Furthermore, if joe quits, then we
> only have a record that says someone with an id of 7 taught a course with an
> id of 123 -- that doesn't make much sense, does it?
>
> Now, if we reconsider the Course-to-Tutor table and instead of putting in
> the tutor_id and course_id, we put in the actual name of the tutor and title
> of the class, then at least we have a history that makes sense.
>
> However by doing that, we violate the primary principle of a relational
> database by creating redundant data.
>
> I guess that one can say that if we never actually delete the tutor nor the
> course, but instead just make them inactive as well, then we have solved the
> problem -- is that the answer?

This is the way I would approach it. Once you've used an ID in a
database you should never reuse it so long as there are references to
it in other tables. If you need to exclude it from use you need a flag
to indicate that.

> In any event, what problems do you foresee if I used the actual names and
> titles for the Course-to-Tutor table instead of their id's? After all, the
> instance records would retain a simple history of what happened regardless
> of deletions AND a search for "Introduction to Moonshine" should be
> essentially as fast as a search for "123" if both fields are indexed, am I
> right or wrong?

If you need to change any of the names, e.g. to fix a typo, it's
potentially a massive update, whereas it's a change to a single row if
you use IDs.

Another thing to bear in mind is the size of your indexes. IDs are
small, strings are not. Having said that it depends on the DB server
as some are better at handling string indexes than others.

Depending on exactly what your history requirements are it may be
worth exploring other options to maintaining a history. One of the
sites I work on maintains an audit log such that if something gets
deleted that gets recorded in a semi-structured but fairly generic way
so if I need to know when record x was deleted from table y and what
record x contained I can craft a SQL query to get that from the audit
log table, albeit fairly slowly. I also records inserts and updates to
create a complete record.

-Stuart

-- 
http://stut.net/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php