Performance question for table updating

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all:

    For those who've been following the saga, I'm working on an application that needs to load a data file consisting of approximately 29,000 to 35,000 records in it (and not short ones, either) into several tables.  I'm using MySQL as the database.

    I've noticed a really horrible performance difference between INSERTing rows into the table and UPDATEing rows with new data when they already exist in the table.  For example, when I first start with an empty table, the application inserts around 29,600 records in something less than 6 minutes.  But, when I use the second file, updating that same table takes over 90 minutes.

    Here's my question: I had assumed -- probably wrongly -- that it would be far more expedient to only update rows where data had actually changed; moreover, that I should only update the changed fields in the particular rows.  This involves a large number of if statements, i.e.,

    if($old_row["field_a"] !== $new_row["field_66"] {
        $update_query .= "field_a = '" . mysql_real_escape_string($new_row["field_66"]) . "',";
    }

    Eventually, I wind up with a query similar to:

        UPDATE table_01 SET field_a = 'New value here', updated=CURDATE() WHERE primary_key=12345

    I thought that, to keep the table updating to a minimum, this approach made the most sense.  However, seeing the two hugely different performance times has made me question whether or not it would be faster to simply update every field in the table and eliminate all of these test conditions.

    And, before someone comments that indexes on the table can cause performance hits, I DROP nearly all of the indexes at the start of the processing, only keeping those indexes necessary to do the original INSERT or the subsequent UPDATE, and then add all of the extra "steroid" indexes (you know -- the performance-enhancing ones <g>) after all of the INSERTs and UPDATEs have been finished.

    So, long story short (oops -- too late!), what's the concensus among the learned assembly here?  Is it faster to just UPDATE the record if it already exists regardless of the fact that maybe only one or two out of 75 or more fields changed versus testing each one of those 75 fields to try and figure out which ones actually changed and then only update those?

    I look forward to reading all of your thoughts.

    Sincerely,

        Jon

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux