Joel Holdsworth <jholdsworth@xxxxxxxxxx> writes: >> Makes sense, and I am with others who commented on the previous >> discussion thread that the right approach to take is to take the stuff coming >> from Perforce as byte strings, process them as such and write them out as >> byte strings, UNLESS we positively know what the source and destination >> encodings are. >> >> And this change we see here, matching with patterns, is perfectly in line with >> that direction. Very nice. > > Not bad. Fortunately, it's not possible for $ characters to appear as a component of a multi-byte UTF-8 character, so it's possible to do the matching byte-wise. > >> >> > try: >> > - with os.fdopen(handle, "w+") as outFile, open(file, "r") as inFile: >> > + with os.fdopen(handle, "wb") as outFile, open(file, "rb") as inFile: >> >> We seem to have lost "w+" and now it is "wb". I do not see a reason to make >> outFile anything but write-only, so the end result looks good to me, but is it >> an unrelated "bug"fix that should be explained as such (e.g. "there is no >> reason to make outFile read-write, so instead of using 'w+' just use 'wb' >> while we make it unencoded output by adding 'b' to it")? > > I am happy to split this change into a separate patch if this is preferred. I do not think this is big enough for a separate patch; just a mention in the log message is sufficient.