Re: Design Dilemma - Database Data Abstraction

"Martin Alterisio" <malterisio777@xxxxxxxxx> · Wed, 11 Apr 2007 13:47:15 -0300

2007/4/10, Richard Lynch <ceo@xxxxxxxxx>:

On Sat, April 7, 2007 11:49 am, Martin Alterisio wrote:
> The solution I presented is to access, and act upon, a database as if
> they
> were PHP arrays, meaning that a table is presented as an array of
> records.

I don't quite get why you think this is any more fool-proof than using
a database in the first place, but okay, I guess...

It's not intended to be fool-proof, and I never said so. I said
"those-who-never-RTFM-proof", maybe some of those are fools, but many of
those have not received the proper preparation in computer sciences, and
maybe it's not their fault they can't read a manual meant for us coders to
read.

Anyway, if there is already an API that allows us to create objects that
look and act like PHP arrays, why don't just use that API for something like
an ORM?

My dilemma is as follows: a PHP array is a construct more restricted
> than a
> DB table. In a PHP array the index is either an int or a string, in a
> table
> de index can be any combination of fields. Then, my problem is how to
> design
> coherently the indexing of the arrays that represent the DB tables.

> I could index by the order as they are presented by the DB:

You have not specified any ORDER BY for the DB so far.

Why should I impose an order if the implementation may want to decide this
order.

By definition, then, the order is UNDEFINED and the DB is free to
return the result set in ANY order it finds convenient.

And that's okay, and I shall use the order the DB finds convenient.

$DB['users'][0] is the first user from the query "SELECT * FROM users"
> $DB['users'][1] is the second user from the query "SELECT * FROM
> users"
> etc..
>
> But this have many cons. First, without a deterministic order, the
> array can
> change its logic order on the whim of the DB, nobody assures that the
> order
> will be kept after a modification is made to the data, and this can be
> confusing and error prone:

Until you define an ORDER BY in the DB, there is no order, period.

If the DB presents the result sequentially there is an order, even if it
follows an undefined criteria.

If you *DO* define an ORDER BY in the DB, you'll have to be able to
write a PHP function which can duplicate that ORDER BY and
http://php.net/usort the array, or dis-allow changes to the array that
introduce new keys, and disallow using integer keys.

Why? I just let the DB manage the order. To use usort I'll have to actually
put the results in a real array, wasting memory, cpu time, and db access.

$name1 = $DB['users'][3]['name'];
> $name2 = $DB['users'][5]['name'];
> $DB['users'][3]['name'] = $name2;
> $DB['users'][5]['name'] = $name1;
>
> The last sentence may not be writing to the adequate record.

What's wrong here?

For example, if the current order for the 'users' table is by the 'name'
field, after the first write this order may have changed, then the record
we'll be trying to write next won't be the one we intended to.

If 5 isn't 5, then you are screwed from the get-go...

That's why I don't like this indexing criteria.

Another possible indexation could be by the value of the PK, but this
> also
> have some problems. First, it can be confusing if the PK is an
> autonumeric
> int, as this might be seen as a numeric indexation. Second, not all
> tables
> have only one field as PK (I can ask that all tables have at least a
> PK, but
> I can't ask that the PK is made of only one field).

Exposing an auto_increment field to anything above the DB layer is
almost always a Bad Idea (tm).

Why? Or, what do you mean by exposing the auto_increment field?

Show me one site that doesn't expose the value of this field (when needed)
in the url.

$user = $DB['users'][$userid]; // get
> $DB['users'][$userid] = $user; // update or insert
> $DB['users'][] = $userid; // insert
> unset($DB['users'][$userid]); // delete

I think you will find that detecting the array changes and pushing
them back down to the DB will be much more difficult than just
providing an API to a database in some kind of normalized functions...

That's the easy part, I'll just use the SPL interfaces. Sorry, I forgot to
mention this in my first email as I explained in the following emails.

There is also the problem with many-to-many relationships. If there
> was only
> one table that related two tables in this way, I could do the
> following:

Well, yeah...

You are going to have to have at least 2 if not 3 arrays to do that
with any efficiency at all...

Hey. Would you think I'll even propose this here if I didn't think I could
guarantee at least the same efficiency as other ORMs?

$DB['users'][$userid]['groups'] <- groups where the user belongs
> $DB['groups'][$groupid]['users'] <- the users of a group
>
> There would be a third table other than users and groups which doesn't
> show
> up. But, what to do when there is more than one relationship table for
> the
> same two tables? And if the relationship table also had some extra
> fields?

You can't just call everything $DB then, can you?

You'll need to name the tables as part of the key system.

At which point you might as well go back to DB and tables, really...

Sorry, I wasn't able to understand you here.

Also the delete action presents some problems:
>
> unset($DB['users'][$userid]['groups'][$groupid]);
>
> Am I deleting the group or the relationship between that user and
> group? I
> believe the last is more coherent, but still it can be prone to
> confusion.

[shudder]

Your first problem was in cramming an N-to-N relationship into a
single array...

.....

I could make $DB['users'][$userid]['groups'][$groupid] represent the
> relationship rather than a record of groups table, where both the
> group data
> and the relationship data are stored (I would have to ask that there
> are no
> field names duplicated between these tables). This way I solve the
> problem
> with the extra data, and the problem with the delete. About the
> possibility
> that there may be two relationship tables for the same pair of tables,
> using
> explicitly the name of the relationship table instead of the name of
> the
> other table would solve the ambiguety (but I think it would be
> unusual, and
> I rather keep the access through the name of the other table).

Ugh!

If my field names can't match up between tables for fields that have
the FK relationship, I would categorically refuse to work with your
arrays.

That's not what I was saying. Suppose you have a N-N relationship between a
table 'users' and a table 'groups'. This relationship is represented by a
table 'users_groups'. I'm asking that if the 'user_groups' table has more
fields other than those of the relationship, those fields don't have the
same name than one in the 'users' or 'groups' table. For example, the
'user_groups' table could have an access level for the user in the group,
this field should not exist in 'users' nor 'groups'. 'users' and 'groups'
may have fields with the same name.

I'm sure there are more things I'm not considering, that's why I ask
> your
> help to find them, and your opinion about what I've said up to now.

Abasndon this idea quickly, before you waste any more time on it, is
my opinion. :-)

That's why this idea is still in the design stage, so I don't waste time if
it can't be implemented in a good a profitable way. But I think you're
prejudging by saying I've been wasting time in this.

Thanks for your answer.