Re: a better way to do a data import?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think that's possible, so I'll give it a shot.

For some reason, even with straight inserts my php script is dying around
180,000 rows.  Basically, I took out all the compare/update code so now I
grab the row from the db and if there isn't one, do an insert.  I've wiped
my db so should do straight inserts so I'm not sure what's taking up the
memory.



private function processFile($table, $key){

            $this->openFileForReading(); //foudn in GLFile class
            while (!feof($this->fileHandle)) {
                    $file_data = fgets($this->fileHandle);

                    $this->insert($table, $key, $file_data);

            }
          $this->closeFile();

    }

private function insert($table, $key, $data){
        if (strlen($data)<10) return false;

        $data=$this->db->escape_string($data);

        //this is the data we will use should we need to do an insert
        $insert_data = str_replace("\"", "'", $data);

        //this is a hack we need to change the separator of the file
        //we need this because we need to put the data into an array and if
        //we simply use the comma, then it splits address fields
        $data = str_replace("\",\"", "~", $data);
        $data = str_replace("\"","",$data); //let's remove the double quotes
        $this->setDelimiter("~");
        $dataToArray=$this->stringToArray($data);
        //set it back for other functions
        $this->setDelimiter(",");

        //get the id, we trust it is the first column


        $key_data=$dataToArray[0];

        //does the value exist in the database already?
        $sql="select * from prelim_$table where $key='$key_data'";
        $handle=$this->db->select($sql);
        if ($this->db->row_count($handle)>0){
            $textData=array();
            $colsToUpdate="";
            $dataRow="";
            $dataRow=$this->db->fetch_row($handle);
                //now that we have the data, let's merge the row from the
                //file with the column names

                        $textData=array_combine($this->carrierColumns,
$dataToArray);
                        //cast some values that are strings in the text file
                        $textData['cars1']=(int) $textData['cars1'];
                        $textData['car_amount']=(int)
$textData['car_amount'];

$textData['total_work']=trim($textData['total_work']);

$textData['business_zip']=trim($textData['business_zip']);
                          //clean up some old db data
                        $dataRow['rfc_number']=trim($dataRow['rfc_number']);

$dataRow['business_zip']=trim($dataRow['business_zip']);



                  $colsToUpdate=array_diff($textData,$dataRow);

            //if we have columns to update, do it
            if (count($colsToUpdate)>0){

                $colset="";
                foreach ($colsToUpdate as $column=>$value){
                    $colset.="$column='$value',";
                }
                //strip last comma
                $colset=substr($colset, 0, -1);
                            $sql="update prelim_$table set $colset where
$key='$key_data'";
                            $this->db->write($sql);

            }

            $dataRow=NULL;
            $colsToUpdate=NULL;
            $colset=NULL;
            $textData=NULL;
         }
        else{

        //insert the row
        $sql="insert into prelim_$table values (";
        $sql.=trim($insert_data);
        $sql.=");";
        $this->db->write($sql);
        }




    }


On Jan 21, 2008 12:55 PM, Robert Cummings <robert@xxxxxxxxxxxxx> wrote:

>
> On Mon, 2008-01-21 at 12:35 -0500, blackwater dev wrote:
> > I  have a text file that contains 200k rows.   These rows are to be
> imported
> > into our database.  The majority of them will already exists while a few
> are
> > new.  Here are a few options I've tried:
> >
> > I've had php cycle through the file row by row and if the row is there,
> > delete it and do a straight insert but that took a while.
> >
> > Now I have php get the row from the text file and then to array_combine
> with
> > a default array I have in the class so I can have key value pairs.  I
> then
> > take that generated array and do array_diff to the data array I pulled
> from
> > the db and I then have the columns that are different so I do an update
> on
> > only those columns for that specific row.  This is slow and after about
> > 180,000 rows, php throws a memory error.  I'm resetting all my vars to
> NULL
> > at each iteration so am not sure what's up.
> >
> >
> > Anyone have a better way to do this?  In MySQL, I could simply a replace
> on
> > each row...but not in postgres.
>
> Does Postgres support any method of temporarily disabling keys/indexing?
> Indexing is what causes the inserts to take a while. MySQL can optimize
> an import by locking the table and allowing the keys/indexes to be
> temporarily disabled. You'll see the following lines in recent MySQL
> database dumps surrounding the inserts:
>
> /*!40000 ALTER TABLE `xxx` DISABLE KEYS */;
> INSERT ...
> INSERT ...
> /*!40000 ALTER TABLE `xxx` ENABLE KEYS */;
>
> Cheers,
> Rob.
> --
> ...........................................................
> SwarmBuy.com - http://www.swarmbuy.com
>
>    Leveraging the buying power of the masses!
> ...........................................................
>
>

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux