>>>>> "Matthias" == Matthias Apitz <guru@xxxxxxxxxxx> writes: Matthias> i.e. 0xc3 is translated to 0xc383 and the 2nd half, the Matthias> 0xbc to 0xc2bc, both translations have nothing to do with Matthias> the original split 0xc3bc, and perhaps in this case it Matthias> would be better to spill out a blank 0x40 for each of the Matthias> bytes which formed the 0xc3bc. If the only malformed sequences are there as a result of splitting up valid sequences, then you could do something like convert all invalid sequences to (sequences of) noncharacters, then once the data is imported, fix it up by adjusting how the data is split and regenerating the correct sequence (assuming your application allows this). For example you could encode an arbitrary byte xy as a sequence of two codepoints U+FDDx U+FDEy (the range FDD0-FDEF are all defined as noncharacters). -- Andrew (irc:RhodiumToad)