> Makes sense, and I am with others who commented on the previous > discussion thread that the right approach to take is to take the stuff coming > from Perforce as byte strings, process them as such and write them out as > byte strings, UNLESS we positively know what the source and destination > encodings are. > > And this change we see here, matching with patterns, is perfectly in line with > that direction. Very nice. Not bad. Fortunately, it's not possible for $ characters to appear as a component of a multi-byte UTF-8 character, so it's possible to do the matching byte-wise. > > > try: > > - with os.fdopen(handle, "w+") as outFile, open(file, "r") as inFile: > > + with os.fdopen(handle, "wb") as outFile, open(file, "rb") as inFile: > > We seem to have lost "w+" and now it is "wb". I do not see a reason to make > outFile anything but write-only, so the end result looks good to me, but is it > an unrelated "bug"fix that should be explained as such (e.g. "there is no > reason to make outFile read-write, so instead of using 'w+' just use 'wb' > while we make it unencoded output by adding 'b' to it")? I am happy to split this change into a separate patch if this is preferred. Joel