Bruno Wolff III <bruno@xxxxxxxx> writes: > However I am wondering about my use of [[:graph:]] to match characters > that have glyphs. I was not expecting there to be characters that have > glyphs to not be in the graph class. In the short term I might want to > change the way I am testing that. [ looks into code... ] The [[:foo:]] notations only work up to Unicode code point U+7FF at the moment, per this comment in regc_pg_locale.c: * Decide how many character codes we ought to look through. For C locale * there's no need to go further than 127. Otherwise, if the encoding is * UTF8 go up to 0x7FF, which is a pretty arbitrary cutoff but we cannot * extend it as far as we'd like (say, 0xFFFF, the end of the Basic * Multilingual Plane) without creating significant performance issues due * to too many characters being fed through the colormap code. This will * need redesign to fix reasonably, but at least for the moment we have * all common European languages covered. Otherwise (not C, not UTF8) go * up to 255. These limits are interrelated with restrictions discussed * at the head of this file. Unfortunately, these particular characters are U+2013 and U+2014 so you lose. Obviously there's room for improvement here, but so far nobody's been motivated to work on it. Last discussion about it (AFAIR) was this thread: https://www.postgresql.org/message-id/flat/24241.1329347196%40sss.pgh.pa.us I'm not sure if any of the subsequent work on the regex engine would make it any easier to fix than it seemed at the time. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general