> TBH, even if you came up with a complete patch, we'd probably > reject it as unmaintainable and a security hazard. The problem > is that code may scan a string looking for certain ASCII characters > such as backslash (\), which up to now it's always been able to do > byte-by-byte without fear that non-ASCII characters could confuse it. > To support GB18030 (or other encodings with the same issue, such as > SJIS), every such loop would have to be modified to advance character > by character, thus roughly "p += pg_mblen(p)" instead of "p++". > Anyplace that neglected to do that would have a bug --- one that > could only be exposed by careful testing using GB18030 encoding. > What's more, such bugs could easily be security problems. > Mis-detecting a backslash, for example, could lead to wrong decisions > about where string literals end, allowing SQL-injection exploits. One of ideas to avoid the concern could be "shifting" GB18030 code points into "ASCII safe" code range with some calculations so that backend can handle them without worrying about the concern above. This way, we could avoid a table lookup overhead which is necessary in conversion between GB18030 and UTF8 and so on. However I don't come up with such a mathematical conversion method for now. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp