On Thu, Mar 18, 2010 at 05:00:24PM +0000, Ashley Sheridan wrote: > On Thu, 2010-03-18 at 12:57 -0400, Paul M Foster wrote: > > On Thu, Mar 18, 2010 at 04:15:33PM +0000, Ashley Sheridan wrote: > > > On Thu, 2010-03-18 at 12:12 -0400, Paul M Foster wrote: > > > > On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote: > > <snip> > > Explode won't work in the case of a comma in a field value. > > That's why I convert the files to tab-delimited first. explode() does > work in that case. > > > > > Also, newlines can exist within a field value, so a line in the > file doesn't > > equate to a row of data > > I've never seen this in the files I receive. > > > > > The best way is just to start parsing at the beginning of the file > and break it > > into fields one by one from there. > > > > The bit I don't like about characters other than a comma being used > in a "comma > > separated values" file is that you can't automatically tell what > character has > > been used as the delimiter. Hence being asked by spreadsheet programs > what the > > delimiter is if a comma doesn't give up what it recognises as valid > fields. > > I've honestly never seen a "CSV" or "Comma-separated > Values" which used > tabs for delimiters. At that point, it's really not a *comma* separated > value file. > > My application for all this is accepting mailing lists from customers > which I have to convert into DBFs for a commercial mailing list program. > Because most of my customers can barely find the on/off switch on their > computers, I never know what I'm going to get. So before I string > together the filters to process the file, I have to actually look at and > analyze the file to find out what it is. Could be a fixed-field length > file, a CSV, a tab-delimited file, or anything in between. Once I've > selected the filters, the sequence they will be put together in, and the > fields from the file I want to capture, I hit the button. After it's all > done, I now have to look at the result to ensure that the requested > fields ended up where they were supposed to. > > Paul > > -- > Paul M. Foster > > > > But surely whatever character is used as the delimiter could be part of the > fields value? Well, remember I shove these into tab-delimited files. It does occasionally happen that someone will slip a tab into a field. When that happens, I can tell when the final result is off. Then I do a hex dump of the file (in PHP) to determine if it actually is a tab. If so, I have a filter I prepend to the line of filters which removes tabs from the original CSV file. Then proceed as before. Occasionally someone will send me a file in a "label" format which contains \x0C characters or somesuch at page boundaries. I actually have to look at the file and find out what they've inserted. I have filters for most anything I find like that. Paul -- Paul M. Foster -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php