RE: Very Large text file parsing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2007-09-20 at 14:25 +0200, Paul Scott wrote:
> On Thu, 2007-09-20 at 08:03 -0400, Robert Cummings wrote:
> > Post some samples of the data you are parsing and a sample of the code
> > you've written to parse them. If you're parsing 750 megs of data then
> > it's quite likely you could squeeze some performance out of the parse
> > routines themselves.
> 
> Today's dataset is in a CSV (tab separated) , so I am using fgetcsv, it
> looks like this (geo data):
> 
> 936374	Roodepoort	Roodepoort	Roodeport-Maraisburg	-26.1666667	27.8666667
> P	PPL	ZA	ZA	06				0		1759	Africa/Johannesburg	2004-05-11
> 
> Code:
> [SNIP]
> $row = 1;
> $handle = fopen($csvfile, "r");
> while (($data = fgetcsv($handle, 1000, "\t")) !== FALSE) {
>      $num = count($data);
>      $row++;
>      $insarr = array('userid' => $userid, 
>                     'geonameid' => $data[0], 
>                     'name' => $data[1], 
>                     'asciiname' => $data[2], 
>                     'alternatenames' => $data[3], 
>             	    'latitude' => $data[4], 
>                     'longitude' => $data[5], 
>                     'featureclass' => $data[6], 
>                     'featurecode' => $data[7], 
>             	    'countrycode' => $data[8], 
>                     'cc2' => $data[9], 
>                     'admin1code' => $data[10], 
>                     'admin2code' => $data[11], 
>             	    'population' => $data[12], 
>                     'elevation' => $data[13], 
>                     'gtopo30' => $data[14], 
>                     'timezoneid' => $data[15], 
>             	    'moddate' => $data[16]
>             	);
>          $this->objDbGeo->insertRecord($insarr);
>     	//$arr[] = $data;
> }
> fclose($handle);
> 
> --Paul
> 
> 
> All Email originating from UWC is covered by disclaimer
> http://www.uwc.ac.za/portal/uwc2006/content/mail_disclaimer/index.htm 

You can probably speed this up A LOT if you can batch multiple queries.
For instance, in MySQL you can do:

    INSERT INTO some_table
    ( x, y, z )
    VALUES
    ( 1, 2, 3 ),
    ( 2, 3, 4 ),
    ( 5, 6, 7 ),
    ...

If you do these in batches of 1000 you should be able to make a big time
savings. Since you're useing fgetcsv() it's doubtful you can improve the
file access/parse. Your bottleneck is most likely the database inserts.

Cheers,
Rob.
-- 
...........................................................
SwarmBuy.com - http://www.swarmbuy.com

    Leveraging the buying power of the masses!
...........................................................

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux