Re: optimizing PHP for microseconds

Nathan Rixham <nrixham@xxxxxxxxx> · Sun, 28 Mar 2010 17:04:11 +0100

mngghh, okay, consider me baited.

Daevid Vincent wrote:
>> Per Jessen wrote:
>>> Tommy Pham wrote:
>>>
>>>> (I remember a list member, not mentioning his name, does optimization
>>>> of PHP coding for just microseconds.  Do you think how much more he'd
>>>> benefit from this?)
>>> Anyone who optimizes PHP for microseconds has lost touch with reality -
>>> or at least forgotten that he or she is using an interpreted language.
>> But sometimes it's just plain fun to do it here on the list with 
>> everyone further optimizing the last optimized snippet :)
>>
>> Cheers,
>> Rob.
> 
> Was that someone me? I do that. And if you don't, then you're the kind of
> person I would not hire (not saying that to sound mean). I use single
> quotes instead of double where applicable. I use -- instead of ++. I use
> $boolean = !$boolean to alternate (instead of mod() or other incrementing
> solutions). I use "LIMIT 1" on select, update, delete where appropriate. I
> use the session to cache the user and even query results. I don't use
> bloated frameworks (like Symfony or Zend or Cake or whatever else tries to
> be one-size-fits-all). The list goes on.

That's not optimization, at best it's just an awareness of PHP syntax
and a vague awareness of how the syntax will ultimately be interpreted.

Using "LIMIT 1" is not optimizing it's just saying you only want one
result returned, the SQL query could still take five hours to run if no
indexes, a poorly normalised database, wrong datatypes, and joins all
over the place.

Using the session to cache "the user" is the only thing that comes
anywhere near to application optimisation in all you've said; and
frankly I would take to be pretty obvious and basic stuff (yet pointless
in most scenario's where you have to cater for possible bans and
de-authorisations) - storing query results in a session cache is only
ever useful in one distinct scenario, when the results of that query are
only valid for the owner of the session, and only for the duration of
that session, nothing more, nothing less. This is a one in a million
scenario.

Bloated frameworks, most of the time they are not bloated, especially
when you use them properly and only include what you need on a need to
use basis; then the big framework can only be considered a class or two.
Sure the codebase seems more bloated, but at runtime it's easily
negated. You can use these frameworks for any size project, enterprise
included, provided you appreciated the strengths and weaknesses of the
full tech stack at your disposal. Further, especially on enterprise
projects it makes sense to drop development time by using a common
framework, and far more importantly, to have a code base developers know
well and can "hit the ground running" with.

Generally unless you have unlimited learning time and practically zero
budget constraints frameworks like the ones you mentioned should always
be used for large team enterprise applications, although perhaps
something more modular like Zend is suited. They also cover your own
back when you are the lead developer, because on the day when a more
experienced developer than yourself joins the project and points out all
your mistakes, you're going to feel pretty shite and odds are very high
that the project will go sour, get fully re-written or you'll have to
leave due to "stress" (of being wrong).

> I would counter and say that if you are NOT optimizing every little drop of
> performance from your scripts, then you're either not running a site
> sufficiently large enough to matter, or you're doing your customers a
> disservice.

Or you have no grasp of the tech stack available and certainly aren't
utilizing it properly; I'm not suggesting that knowing how to use your
language of choice well is a bad thing, it's great; knock yourself out.
However, suggesting that optimising a php script for microseconds will
boost performance in large sites (nay, any site) shows such a loss of
focus that it's hard to comprehend.

By also considering other posts from yourself (in reply to this and
other threads) I can firmly say the above is true of you.

Optimisation comes down to running the least amount of code possible,
and only when really needed. If you are running a script / query /
process which provides the same output more than once then you are not
optimising. This will be illustrated further down this reply perfectly.

The web itself is the ultimate scalable distributed application known to
man, and has been guided and created by those far more knowledgeable
than you or I (Berners-Lee, Fielding, Godel, Turing et al), everything
you need is right there (and specifically in HTTP). Failing to leverage
this is where a lack of focus and scope comes in to play, especially
with large scale sites, and means you are doing your customers a disservice.

For anything where the output can be used more than once, (at a granular
level), the output should be cached.

For example, if you run SELECT / UPDATE/INSERT queries at a ratio any
higher than 1 SELECT per UPDATE/INSERT then you *will* get a sizeable
performance upgrade by caching the output. Another less granular example
would be a simple "blog", you can generated the page every time, or you
can only "publish" the page every time the post is updated or a comment
is added; and thus you can leverage file system cache's which most
operating systems have now, and http server caching, and HTTP caching
itself by utilizing last-modified; etags and having 304 not modified
returned for any repeat requests.

> I come from the video game world where gaining a frame or two of animation
> per second matters. It makes your game feel less choppy and more fluid and
> therefore more fun to play.

Many lessons can be learned from the video game (and flash) worlds, but
these are generally just how to code well; most of the real
optimizations come from how you serialize data, minimise the amount of
output data + frequency at which it is sent; and moreover by compiler or
bytecode optimisations - some of this can cross over in to PHP world,
but not much since it's interpreted rather than compiled, and even less
since the same code isn't run hundreds of time per second - and if it
is; you are normally doing something wrong (in all but the most specific
of cases).

> If I have to wait 3 seconds for a page to render, that wait is noticeable.
> Dumb users will click refresh, and since (unbelievably in this day and age)
> PHP and mySQL don't know the user clicked 'stop' or 'refresh', and
> therefore mySQL will execute the same query a second time. That's an
> entirely different thread I've already ranted on about.

Render time is a totally different subject, since css/images/javascript
and more come in to play, not to mention the users browser and machine
spec. This is usually improved by including image width and height in
your html (negate this and the user agent has to "sniff" all images to
get their dimensions before layout can be calculated and later
rendered), using static shared stylesheets which can be returned as 304
not modified; and including client-side scripts as deferred or after the
main body of content (hence why google analytics specifies the placing
of their javascript just before the </body> tag).

Now if there was one sentence in all of the recent posts which conveys
the amount of misunderstanding at play here, it's this one: "Dumb users
will click refresh, and since (unbelievably in this day and age) PHP and
mySQL don't know the user clicked 'stop' or 'refresh', and therefore
mySQL will execute the same query a second time."

No no no no no! Unbelievably in this day and age developers are still
creating systems where the "same queries" (implying the same output) can
be executed by something as foreign a second time (and indeed multiple
times).

If you learn anything from this, learn that this is the crux of the
failings, the output of that query, at the very least, should be cache'd
- thankfully your rdmbs is partially saving your ass half the time by
using it's own cache.

PHP and MySQL are not being dumb here, you are in *full* control of what
happens in your application, and if you have it set up so that the same
things, producing the same results, are being run time after time, then
more fool you. That output should be saved, in memory or file, and used
the second time; ideally that full view (if accessed generally more than
once) should be persisted so that it can be served statically until part
of the view needs updated; then regenerate and repeat.

> If you can shave off 0.1s from each row of a query result, after only 10
> rows, you've saved the user 1 full second. But realistically, you are most
> likely displaying hundreds (or in my case, thousands) of rows. Now I've
> just saved this user 10s to 100s (that's a minute and a half!)

<start-negativity>
O.M.G. am I reading these numbers correctly? shave off 0.1 seconds from
each row? saving the user 10-100 seconds? Just how are you coding these
applications!
</end-negativity>

In my world, if a "heavy" script is taking any more than 0.1 seconds to
run in it's entirety we have a problem; honestly, I'm unsure what to
write here - the only constructive thought I have is, why don't we have
a "PHP week" on the list; where a standard application is created; then
we optimise the hell out of it and catalogue what was done for all to see.

We'd need:
2 temporary servers (one web, one db : any spec)
1 donated "application" w/ data

I'd be up for it; and would be interested to see who just quick we can
make the thing between us all.

Would suggest a few test scripts where made to call a series of
operations, user paths as it were, then run it through ab and get some
numbers.

> I'm dealing with TB databases with billions of rows and complex queries
> that would make you (and often times me too) cringe in fright. Sure, if
> you're dealing with your who-gives-a-shit "blog" website and all 20 entries
> of crap-nobody-cares-about, then do whatever you want. But if you're doing
> professional, enterprise level work, or have real customers who expect
> performance, then you sure as hell better be considering all the ways to
> speed up your page. They don't run in a vacuume. They don't just have a
> single query.

no comment; I'm doing the same and have done for years; and the words
you are coming out with just don't add up - if you are on TB datasets
why the hell are you using RDBMS and php/mysql?? you need to be on to
non relational databases; and considering the hadoops of the world.

Suffice to say, if you have a complex query - something is vastly wrong
with the full architecure and system design.

all from experience.

Finally, reading through the list posts from the last week or two I've
become rather concerned about just how much disinformation and lack of
understanding is floating about. Many of the long time posters on this
list who do know better have either kept quiet or not covered the points
properly, whilst many more have been baited in to discussing questions
and points which have no answer, because they are the wrong questions to
be asking in the first place.

Times like this call for a smart-ass, and today I'll be that smart-ass;
not because I want to be labelled as such, but so that the other
knowledgeable people on the list can hook up on anything I've got wrong
and challenge it; and hopefully, ultimately, we'll have a full positive
thread that all can read and gain positive insight from as to how to use
PHP and leverage the full stack of technologies we have available to
address most (if not all) the points raised recently.

And Daevid, specifically, don't think for a minute these aren't learning
curves many of us have taken - skip back a couple of years, look through
the posts, and you'll find another developer banging on about threads in
php and optimising for micro-seconds ;)

Many Regards,

Nathan

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php