Re: Efficiently parsing a File

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I'm assuming that you have a fixed number of items in each row in the flat file? You must have an idea of how long each line in the file is. Pick a long enough buffer so that your lines don't get truncated, but they will terminate on EOL.

$handle = @fopen("your.delimited.file", "r");
$counter = 0;
$array = array();
if ($handle)
 {
   while(($buffer = fgets($handle, 4096)) !== false)
      {
        $counter++;
          if ($counter > 1) // skip the first line of header info
           {
               $array = explode(",", $buffer);  //
              // now do what you will with the array
            }
         }
   }

You might also look at the mysql "LOAD DATA INFILE"

LOAD DATA INFILE 'filename' INTO TABLE tbl_name ENCLOSED BY ',' IGNORE 1 LINES

very fast, very efficient. It's way faster than doing inserts. the mysql user needs to be able to read the file. your script executes the query and mysql does the rest.

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

--Curtis


On 3/18/2014 5:33 PM, Tiago Hori wrote:
Hi Jim,

One quick question:

When you use this call to the mysql database ("INSERT INTO genotyped (runid, fishid, plateid, assayid, allele1, allele2) VALUES ". join(', ', $dataSet));

There are more than 6 items in the $dataSet array as the while loop parsed the whole file, correct? I just want to make sure I understand correctly. Does that INSERT command knows to start another row in the table after it inserted the first 6 values?

Thanks!

Tiago


On Mar 18, 2014, at 3:38 PM, Jim Lucas <lists@xxxxxxxxx> wrote:

On 03/18/2014 10:57 AM, Tiago Hori wrote:
Hi Everyone,

I fairly new at this, so please bear with me. :)

I am building this web app for a project I am working at where I need to
store and process large amounts of data.

The data comes in comma delimited style. There are a bunch of headers that
I don't need and then the data. These files contain 96 times 96 entries and
I need to parse each one of those. Right now I have something working that
takes about 5 minutes to parse the whole file. Any tips on how to make this
run more efficiently would be greatly appreciated.

Thanks,

Tiago

here is the relevant code:


I cleaned up the code a little.  Moved a few things around.  Removed a few duplicate things and a part that didn't seem useful. Give this code a try:

if ( $_FILES['data'] ) {
    $dataSet = array();
    $parts = array();
    $c = 0;
    $filename = $_FILES['data']['name'];
    if ( file_exists($filename) ) {
        die ("A file with this name already exists in the database <br />");
    } else {
        move_uploaded_file($_FILES['data']['tmp_name'], $filename);
    }
    $runid = substr($filename, 0, -4);
    echo "Uploaded file '$runid' <br />";
    $fh = fopen($filename, 'r') or
         die("File does not exist or you lack permission to open it");
    echo '<table>';
    while ( !feof($fh) ) {
        $parts = fgetcsv($fh);
        if ( $parts[0] == 'ID' ) {
            $id = sanitizeStrings($parts[0]);
            $assay = sanitizeStrings($parts[1]);
            $allele1 = sanitizeStrings($parts[2]);
            $allele2 = sanitizeStrings($parts[3]);
            $name = sanitizeStrings($parts[4]);
        } else if ($t = preg_match("/S\d\d-\D\d\d/", $parts[0]) ) {
            $id = sanitizeStrings($parts[0]);
            $assay = sanitizeStrings($parts[1]);
            $alleles = sanitizeStrings($parts[9]);
            $allele1 = $allele2 = 'No Call';
            if ( preg_match("/[ATCG]:[ATCG]/", $alleles) ) {
                list($allele1, $allele2) = explode(':', $alleles, 2);
            }
            $name = sanitizeStrings($parts[4]);
            if ($name != 'Blank') {
                $dataSet[] = "('{$runid}', '{$name}', '{$id}', '{$assay}', '{$allele1}', '{$allele2}')";
            }
        }
        echo <<<_END
<tr>
  <th>{$name}</th>
  <th>{$id}</th>
  <th>{$assay}</th>
  <th>{$allele1}</th>
  <th>{$allele2}</th>
</tr>
_END;

    }
    if ( $dataSet ) {
        queryMysql("INSERT INTO genotyped (runid, fishid, plateid, assayid, allele1, allele2) VALUES ". join(', ', $dataSet));
    }
    fclose($fh);
    echo '</table>';
}



--
Jim Lucas

http://www.cmsws.com/
http://www.cmsws.com/examples/



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php





[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux