Re: Newbie question about importing text files...

Scott Marlowe <smarlowe@xxxxxxxxxxxxxxxxx> · Wed, 11 Oct 2006 15:29:58 -0500

On Tue, 2006-10-10 at 04:16, Ron Johnson wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 10/09/06 22:43, Jonathan Greenberg wrote:
> > So I've been looking at the documentation for COPY, and I'm curious about a
> > number of features which do not appear to be included, and whether these
> > functions are found someplace else:
> > 
> > 1) How do I skip an arbitrary # of "header" lines (e.g. > 1 header line) to
> > begin reading in data?

Using something like bash, you can do this:

tail -n $(( `wc -l bookability-pg.sql|grep -oP "[0-9]+"` -2 ))
bookability-pg.sql|wc -l

make it an alias and call it skip and have it take an argument:

Put this in .bashrc and run the .bashrc file ( . ~/.bashrc ):

skipper(){
        tail -n $(( `wc -l $1|grep -oP "[0-9]+"` -$2 )) $1
}

> > 2) Is it possible to screen out lines which begin with a comment character
> > (common outputs for csv/txt files from various programs)?

grep -vP "^#" filename

will remove all lines that start with #.  grep is your friend in unix. 
If you don't have unix, get cygwin as recommended elsewhere.

> > 3) Is there a way to read in fixed width files?

If you don't mind playing about with sed, you could use it and bash
scripting to do it.  I have before.  It's ugly looking but easy enough
to do.  But I'd recommend a beginner use a scripting language they like,
one of the ones that starts with p is usually a good choice (perl,
python, php, ruby (wait, that's not a p!) etc...)

> 
> Both Python & Perl have CSV parsing modules, and can of course deal
> with fixed-width data, let you skip comments, commit every N rows,
> skip over committed records in can the load crashes, etc, etc, etc.

php has a fgetcsv() built in as well.  It breaks down csv into an array
and is really easy to work with.