Re: Inserting into multiple tables

Nathan Rixham <nrixham@xxxxxxxxx> · Sun, 11 Apr 2010 19:57:12 +0100

Gary wrote:
> Adam
> 
> Thank you for your well thought out response.
> 
> Two points:
> 
> I did not include any anti-injection functions because this was an 
> experiment for multiple tables, it is on my machine only.
> 
> Since these are php scripts, I dont think anyone will mind (not to mention 
> this board always provides great answers).,
> 
> However I think I may have answered my question about the importance of 
> normalization of tables.  I have written a number of databases used on 
> various web sites, however, they all are used as a collection of data from 
> input forms.
> 
> Most of the information about mutilple tables deals with the retreval of 
> data from, not inserting into, meaning they are more used for known data 
> inserted by the database owner/administrator to be retrieved by queries into 
> the DB.
> 
> Am I on the right track?
> 

I'm unsure if this is of use to you or not (and it has been covered in
part already), but here goes:

Generally when working with database tables, we normalise, or split
information up at natural points where you have a greater than 1-1
relationships between the data items.

An example may be a table structure to store User, Blog Post and Comments.

With this common example it's very inefficient to store all the
information in one table, because it is split naturally in to three.

We have three relationships here; all 1-* (meaning "one to many").

1 User to * Blog Posts
1 Blog Post to * Comments

Thus naturally, and on first glance we would have 3 tables:

| Table User
--------------------------------------
| UserID  |  Username  |  Password  | ...

| Table BlogPost
--------------------------------------
| PostID  |  PostTitle  |  PostersUserID  | ...

| Table Comment
--------------------------------------
| CommentID  |  CommentOnPostID  |  CommentersUserID  | ...

As you can see from the above, all of the rows (or items) in our tables
are linked via IDs to each other.

We can further normalise the above tables to take in to account *-*
(many to many) type relationships, and to fully separate cross cutting
concerns. For instance it may be that a BlogPost has 3 different Users
as author(s).

aside: cross cutting concerns can be considered as something (in this
case a table) trying to handle something which is of no concern to it
(in this case the BlogPost table needs to be aware of Users and their IDs).

To handle the aforementioned we can introduce something commonly
referred to as "link tables", consider:

| Table User
--------------------------------------
| UserID  |  Username  |  Password  | ...

| Table BlogPost
--------------------------------------
| PostID  |  PostTitle  |  ...

| Table BlogPostAuthors
--------------------------------------
| BlogPostID  |  UserID  |

The "link table" BlogPostAuthors acts as a many-to-many join table
between BlogPosts and Users. Similarly we could introduce the same kind
of link table between BlogPosts and Comments, / Users and Comments.

In a real system we may even have another two primary tables introduced,
Roles and UserRoles, as the system may have multiple Roles (Author,
Commenter, Admin etc) and each User may have multiple Roles themselves,
in one capacity I am an Admin, in another I'm an Author. (UserRoles may
be better considered as Personas?)

Ultimately there are many considerations to take in to account, the
relationships between types of data, the frequency at which
inserts/updates/selects occur, the complexity and speed of each query,
and much more.

Designing a table structure is different for each job, with different
considerations and things to weigh up, generally though normalisation
can cater for at least some future scope creep.

It's also worth noting that some consider it bad practise to design a
system from the storage point upwards, because the application and data
should not be constrained by persistence layer features or limitations -
which would indicate designing the data model in UML or suchlike and
dealing with Objects rather than Tables (then later mapping objects to
tables in order to persist them, if choosing a RDBMS as the persistence
layer).

It may also be worth noting that an EAV model is the ultimate in
normalisation and allows all data to be persisted in a single 3 column
structure (or 4 if you partition data). I'll save details of this though.

Do hope that helps in some way, and if you need any more info just shout.

Nathan

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php