Re: Performance question for table updating

"Andrew Ballard" <aballard@xxxxxxxxx> · Tue, 27 Nov 2007 15:44:51 -0500

On Nov 24, 2007 2:32 AM, Jon Westcot <jon@xxxxxxxxxxx> wrote:
> Hi all:
>
>     For those who've been following the saga, I'm working on an application that needs to load
> a data file consisting of approximately 29,000 to 35,000 records in it (and not short ones,
> either) into several tables.  I'm using MySQL as the database.
>
>     I've noticed a really horrible performance difference between INSERTing rows into the
> table and UPDATEing rows with new data when they already exist in the table.  For
> example, when I first start with an empty table, the application inserts around 29,600
> records in something less than 6 minutes.  But, when I use the second file, updating that
> same table takes over 90 minutes.
>
>     Here's my question: I had assumed -- probably wrongly -- that it would be far more
> expedient to only update rows where data had actually changed; moreover, that I should
> only update the changed fields in the particular rows.  This involves a large number of if
> statements, i.e.,
>
>     if($old_row["field_a"] !== $new_row["field_66"] {
>         $update_query .= "field_a = '" . mysql_real_escape_string($new_row["field_66"]) . "',";
>     }
>
>     Eventually, I wind up with a query similar to:
>
>         UPDATE table_01 SET field_a = 'New value here', updated=CURDATE() WHERE
>         primary_key=12345
>
>     I thought that, to keep the table updating to a minimum, this approach made the most
> sense.  However, seeing the two hugely different performance times has made me question
> whether or not it would be faster to simply update every field in the table and eliminate all of
> these test conditions.
>

I don't think indexes are really your trouble. Based on what you've
posted, it looks like you are selecting the entire data set from MySQL
back to PHP and then looping through each record to compare the new
and old values to build your queries. If that's the case, it will
almost assuredly be faster to just loop through the new data and
update everything (even if nothing has changed). Otherwise, you have
the added network traffic to send the entire result set back from
MySQL to PHP plus the per-row, per-column comparisons, string
concatenation operations, and finally another round trip across the
network to MySQL again to execute the query for each row in the data.
If you want to keep some of your optimizations, you can still loop
through the data in your file and build statements like this:

UPDATE table_01 SET
    field_a = 'New value for a',
    field_b = 'New value for b',
    updated=CURDATE()
WHERE primary_key=12345
   AND NOT (
        field_a = 'New value for a'
        AND field_b = 'New value for b'
        )

This way, if nothing in the record has changed, you still have a trip
to MySQL but MySQL has very little work to do since it will find the
record by the primary key, then see that no values have changed, and
update nothing. No indexes need updated and even your `updated` field
is not changed.

Andrew

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php