Han Parker <parker.han@xxxxxxxxxxx> writes: > ·¢¼þÈË: Tatsuo Ishii <ishii@xxxxxxxxxxxx> >> Moving GB18030 to server side encoding requires a technical challenge: >> currently PostgreSQL's SQL parser and perhaps in other parts of >> backend assume that each byte in a string data is not confused with >> ASCII byte. Since GB18030's second and fourth byte are in range of >> 0x40 to 0x7e, backend will be confused. How do you resolve the >> technical challenge exactly? > I do not have an exact solution proposal yet. > Maybe an investigation on MySQL's mechanism would be of help. TBH, even if you came up with a complete patch, we'd probably reject it as unmaintainable and a security hazard. The problem is that code may scan a string looking for certain ASCII characters such as backslash (\), which up to now it's always been able to do byte-by-byte without fear that non-ASCII characters could confuse it. To support GB18030 (or other encodings with the same issue, such as SJIS), every such loop would have to be modified to advance character by character, thus roughly "p += pg_mblen(p)" instead of "p++". Anyplace that neglected to do that would have a bug --- one that could only be exposed by careful testing using GB18030 encoding. What's more, such bugs could easily be security problems. Mis-detecting a backslash, for example, could lead to wrong decisions about where string literals end, allowing SQL-injection exploits. > Most frequently used 20902 Chinese characters and 984 symbols in GBK is encoded with 2 bytes, which is a subset of GB18030. > Newly added not so frequently but indeed used characters and symbols in GB18030 use 4 bytes. Any efficiency argument has to consider processing costs not just storage costs. As I showed above, catering for GB18030 would make certain loops substantially slower, so that you might pay in CPU cycles what you saved on disk space. It doesn't help any that the extra processing costs would be paid by every Postgres user on the planet, whether they used GB18030 or not. In short, I think this is very unlikely to happen. regards, tom lane