On 08/07/10 17:42, Alban Hertroys wrote: > On 8 Jul 2010, at 4:21, Craig Ringer wrote: > >> Yes, that's ancient. It is handled quite happily by \copy in csv mode, >> except that when csv mode is active, \xnn escapes do not seem to be >> processed. So I can have *either* \xnn escape processing *or* csv-style >> input processing. >> >> Anyone know of a way to get escape processing in csv mode? > > > And what do those hex-escaped bytes mean? Are they in text strings? AFAIK CSV doesn't contain any information about what encoding was used to create it, so it could be about anything; UTF-8, Win1252, ISO-8859-something, or whatever Sybase was using. > > I'm just saying, be careful what you're parsing there ;) Thanks for that. In this case, the escapes are just "bytes" - what's important is that, after unescaping, the CSV data is interpreted as latin-1. OK, Windows-1252, but close enough. In the end Python's csv module did the trick. I just pulled in the CSV data, and spat out Postgresql-friendly COPY format so that I didn't need to use the COPY ... CSV modifier and Pg would interpret the escapes during input. In case anyone else needs to deal with this format, here's the program I used. -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/
#!/usr/bin/env python import os import sys import csv class DialectSybase(csv.Dialect): delimiter = ',' doublequote = True escapechar = None quotechar = '\'' quoting = csv.QUOTE_MINIMAL lineterminator = '\n' class DialectPgCOPY(csv.Dialect): delimiter = '\t' doublequote = False escapechar = None quotechar = None quoting = csv.QUOTE_NONE lineterminator = '\n' #class DialectPgCOPY(csv.Dialect): # delimiter = '\t' # doublequote = True # escapechar = '\\' # quotechar = '\'' # quoting = csv.QUOTE_NONE # lineterminator = '\n' def unescape_item(item): ''' noop so far ''' #if item.find("\\X") >= 0: #print repr(item) #return item return item.replace("\\X","\\x") def unescape_row(row): newrow = [] for item in row: newitem = item if type(item) == str: newitem = unescape_item(item) newrow.append(newitem) return newrow def main(infn, outfn): infile = open(infn,'r') outfile = open(outfn,'w') r = csv.reader( infile, dialect=DialectSybase ) w = csv.writer( outfile, dialect=DialectPgCOPY ) for row in r: w.writerow(unescape_row(row)) if __name__ == '__main__': print "customers" main('customer.txt', 'customer_unescaped.txt') print "class" main('class.txt', 'class_unescaped.txt') print "orders" main('orders.txt', 'orders_unescaped.txt') print "items" main('items.txt', 'items_unescaped.txt')
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general