RE: Performance question for table updating (SOLVED)

Bastien Koert <bastien_k@xxxxxxxxxxx> · Sat, 24 Nov 2007 10:14:41 -0500

Could there be some performance gain by uploading the data to another table and then update / insert via sql?

bastien

----------------------------------------
> From: jon@xxxxxxxxxxx
> To: php-general@xxxxxxxxxxxxx
> Date: Sat, 24 Nov 2007 04:03:53 -0700
> Subject: Re:  Performance question for table updating (SOLVED)
> 
> Hi Rob, et al.:
> 
> ----- Original Message -----
> From: "Andrés Robinet" 
>>> -----Original Message-----
>>> From: Jon Westcot [mailto:jon@xxxxxxxxxxx]
>>>
>>> :: gigantic snip here::
>>>
>>>     So, long story short (oops -- too late!), what's the concensus
>>> among the learned assembly here?  Is it faster to just UPDATE the
>>> record if it already exists regardless of the fact that maybe only one
>>> or two out of 75 or more fields changed versus testing each one of
>>> those 75 fields to try and figure out which ones actually changed and
>>> then only update those?
>>>
>>>     I look forward to reading all of your thoughts.
>>>
>>>     Sincerely,
>>>
>>>         Jon
>>
>> I don't know about consensus over here because I'm kind of newgie (stands
>> for new geek, as opposed to newbie which stands for new ball breaker :D :D
>> ). I don't know of your previous messages but I can tell you one story...
>> Some time ago I got involved in a project that required geo-distance
>> calculation (you know distance between two points with latitude and
>> longitude). Basically I had to take a set of points and calculate the
>> distance of each of those points to a given (reference) one. The math was
>> something like the "square root of the sum of a constant times the square
>> sin of..." well, I can't remember it, but the point is, it was a
> complicated
>> formula, which I thought it would allow for some optimizations in PHP.
>> Accustomed to regular (compiled) programming languages I developed a set
> of
>> routines to optimize the task and went ahead and queried the database for
>> the (say, 1000 records) dataset of points. Then applied the math to the
>> points and the reference point and got the result... in about 5 minutes to
>> my (disgusting) surprise.
>> Then I grabbed the MySQL manual, built a "non-optimized" version of the
>> formula to put directly in the SQL query and get the "shortest distance"
>> (which was my goal in the end) calculated by MySQL right away. I thought
>> "ok, I'll prepare a cup of coffee to wait for MySQL to finish the
>> calculation". To my surprise the query returned the expected result in
> less
>> than 2 seconds.
>> My logic was (wrongly) the following: PHP is a programming language, SQL
> is
>> a data access language; I'll get the data using MySQL and do the math
> using
>> PHP. But I forgot PHP is an interpreted language, that a number is more
> than
>> a number to PHP, but a ZVAL_ object behind the scenes. I forgot
>> about the memory and the time required to build those objects when one
>> retrieves data out of a database server. I forgot about parsing time, and
>> "support logic and safety checks" in the language that overkill any
> attempt
>> to build TDCPL (Too Damn Complex Programming Logic) in PHP.
>> So, now, when I have to do some logic stuff to the retrieved data, I first
>> check "how much" I can push into the query itself, to get little or
> nothing
>> of programming logic in PHP after retrieving (before storing) the data.
>> All that said, I'd give a shot to the MySQL REPLACE function (I wouldn't
>> even branch the code to use INSERT or UPDATE depending on the record
> already
>> existing or not, If you have a primary key, all you need is REPLACE). But,
>> PLEASE LOOK AT THE GOTCHAS (like set col_name=col_name+1). Furthermore, If
>> those data files were to be uploaded by me (I mean, me, the coder, not the
>> end user), I'd build (use) a program to convert them to SQL sentences in
> my
>> desktop PC where I can use faster programming languages and I can wait for
>> five minutes of heavy processing (instead of overkilling the server for
> five
>> minutes which will slow down every other service in there).
>> In the end it depends on your requirements and where you get the data from
>> and if and how you want to automate the task (I didn't get your previous
>> messages, I got subscribed recently, if you can send me a link to those
>> ones... great!)
>>
>> Rob
> 
>     Thanks for the comments and suggestions.  Prior to receiving your note,
> I went back and did a bit of checking on my code.  Turns out that the
> problem was "hardware" related -- the infamous "Loose Screw Behind the
> Keyboard."
> 
>     The problem actually boiled down to two quotation marks -- they were
> present in the search code to see if a record with the specified key
> existed, but were omitted in the WHERE clause of the UPDATE statement.  Said
> update therefore refused to use the nice little index I'd provided for its
> use and instead scanned through the entire table to find the record in
> question.
> 
>     Moral of the story?  Two, really.  First, ensure you always reference
> values in the way most appropriate for their type.  Second, don't make your
> idiocy public by asking stupid questions on a public forum.   What's the
> quote (probably attributed to Churchill)?  "It is better to be ignorant and
> silent than to voice one's opinions and remove all doubt." ;)
> 
>     Jon
> 
> -- 
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
> 

_________________________________________________________________
Send a smile, make someone laugh, have some fun! Start now!
http://www.freemessengeremoticons.ca/?icid=EMENCA122
-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php