Re: Spreadsheet_Excel_Reader problem

Paul M Foster <paulf@xxxxxxxxxxxxxxxxx> · Thu, 18 Mar 2010 15:25:31 -0400

On Thu, Mar 18, 2010 at 05:00:24PM +0000, Ashley Sheridan wrote:

> On Thu, 2010-03-18 at 12:57 -0400, Paul M Foster wrote:
> 
>     On Thu, Mar 18, 2010 at 04:15:33PM +0000, Ashley Sheridan wrote:
> 
>     > On Thu, 2010-03-18 at 12:12 -0400, Paul M Foster wrote:
>     >
>     >     On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote:
>     >

<snip>

>     > Explode won't work in the case of a comma in a field value.
> 
>     That's why I convert the files to tab-delimited first. explode() does
>     work in that case.
> 
>     >
>     > Also, newlines can exist within a field value, so a line in the
>     file doesn't
>     > equate to a row of data
> 
>     I've never seen this in the files I receive.
> 
>     >
>     > The best way is just to start parsing at the beginning of the file
>     and break it
>     > into fields one by one from there.
>     >
>     > The bit I don't like about characters other than a comma being used
>     in a "comma
>     > separated values" file is that you can't automatically tell what
>     character has
>     > been used as the delimiter. Hence being asked by spreadsheet programs
>     what the
>     > delimiter is if a comma doesn't give up what it recognises as valid
>     fields.
> 
>     I've honestly never seen a "CSV" or "Comma-separated
>     Values" which used
>     tabs for delimiters. At that point, it's really not a *comma* separated
>     value file.
> 
>     My application for all this is accepting mailing lists from customers
>     which I have to convert into DBFs for a commercial mailing list program.
>     Because most of my customers can barely find the on/off switch on their
>     computers, I never know what I'm going to get. So before I string
>     together the filters to process the file, I have to actually look at and
>     analyze the file to find out what it is. Could be a fixed-field length
>     file, a CSV, a tab-delimited file, or anything in between. Once I've
>     selected the filters, the sequence they will be put together in, and the
>     fields from the file I want to capture, I hit the button. After it's all
>     done, I now have to look at the result to ensure that the requested
>     fields ended up where they were supposed to.
> 
>     Paul
> 
>     --
>     Paul M. Foster
> 
> 
> 
> But surely whatever character is used as the delimiter could be part of the
> fields value?

Well, remember I shove these into tab-delimited files. It does
occasionally happen that someone will slip a tab into a field. When that
happens, I can tell when the final result is off. Then I do a hex dump
of the file (in PHP) to determine if it actually is a tab. If so, I have
a filter I prepend to the line of filters which removes tabs from the
original CSV file. Then proceed as before.

Occasionally someone will send me a file in a "label" format which
contains \x0C characters or somesuch at page boundaries. I actually have
to look at the file and find out what they've inserted. I have filters
for most anything I find like that.

Paul

-- 
Paul M. Foster

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php