Re: php / mysql performance resources

"Richard Lynch" <ceo@xxxxxxxxx> · Wed, 13 Jun 2007 14:13:20 -0500 (CDT)

On Wed, June 13, 2007 12:31 pm, Guillaume Theoret wrote:
> Thanks for the response.
>
> On 6/13/07, Robert Cummings <robert@xxxxxxxxxxxxx> wrote:
>> On Wed, 2007-06-13 at 10:36 -0400, Guillaume Theoret wrote:
>> > Hi everyone,
>> >
>> > Can anyone recommend any good books/resources for php and mysql
>> > performance management? I'm more interested in the software
>> > development side (as opposed to server configuration) of things.
>> I'm
>> > looking for anything that would be good to know when working on
>> > websites that get millions of hits a day.
>> >
>> > Also, if anyone knows of any resources/discussions that illustrate
>> the
>> > relative performance of joins vs multiple selects I'd love to
>> check it
>> > out.
>>
>> JOIN will almost always be faster by virtue of the query being
>> optimized
>> and doing the work within a single request.
>
> Really? I thought the way it worked was that when you joined 2 tables
> it needed to create every row combination applicable and then apply
> the where clause. In large tables wouldn't this be slower? It's these
> kinds of optimizations and when the kick in, etc that I don't know
> much about.

Conceptually, JOIN builds that monster table.

If the DB engine can figure out how to constrain one table or another
BEFORE that JOIN to give a much smaller record set, and if they have
mathematical proof that the end result is the same, then they will
optimize and go with the smaller set when possible.

That's a (very) good thing.

> In our application we wrote an abstraction layer with lazy loading.
> (eg: If a User has a Profile the db users table has a profile_id and
> we create a ProxyProfile that only has an id and will look up its
> other attributes in the db if ever needed and then replace its
> reference by a full Profile object.) Because of this, so far the
> entire app only has 1 join because the other select(s) will only be
> done if and when they're needed. I'm certain this is faster in the
> average case but I wanted to know which is generally faster in case I
> later profile the code and see that in some cases the dependent item
> is pretty much always loaded.

You really should write the code the most straight-forward way you
can, and then optimize after identifying bottle-necks.

Anything else is just optimization-masturbation.

> The db will be under heavy load (once we deploy) but we don't yet
> intend on distributing the database. We did however plan for it since
> in the scenario I described above we just need to create a different
> db connection for a different table. We could theoretically have as
> many different db servers as tables (except for that one join of 2
> tables).

This is the scary part.

You really ought to set up a QA server with simulated heavy load for
real life testing, rather than waiting until you deploy to experience
heavy load.

-- 
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php