david.g.johnston@xxxxxxxxx wrote:bryn@xxxxxxxxxxxx wrote: *BRIEFLY* What does "make for a good bug report" mean, David? Is it: (1.1) You, David, or somebody else who has been officially recognized as a PG Contributor (https://www.postgresql.org/community/contributors/) will file the bug, granting it credibility with their imprimatur? or (1.2) I, Bryn, should file the bug. About "I suspect it is likely to get changed", do you mean: (2.1) Change the doc to match quote_ident's current, unpredictable, behavior? (By all means, substitute "hard to describe accurately, precisely, and yet still tersely" for "unpredictable".) (2.2) Change quote_ident's implementation—and then write new doc to describe the new behavior precisely and accurately? And for this option, the next question is "What's the spec of the changed implementation?" Notice that the issue is broader than just quote_ident, as this test shows: prepare x(text) as select format('« %s », gives « %I ».', $1::text, $1::text); execute x('dog');execute x('Dog'); execute x('农民'); The same over-zealous double-quoting that quote_ident shows for 农民 is shown by format. Presumably they share the same underlying implementation (but, surprisingly, don't re-use the actual SQL parser code). Option 2.1 implies using the same wording for what provokes double-quoting for each function. I'd make a similar argument for option 2.2. *MORE DETAIL* About option (2.2), I mentioned that ORCL's equivalent to quote_ident implements a simpler rule: the text of the identifier that it returns is always surrounded with double quotes, whether or not doing this is necessary. The ORCL scheme relies on the fact that double-quoting when this isn't necessary is in no way harmful. Here’s pseudocode (presented as tested PL/pgSQL) for what a patched C implementation of quote_ident might do: create function quote_ident_2(name in text) returns text language plpgsql as $body$ declare i0 text not null := quote_ident(name); i1 text not null := regexp_replace(regexp_replace(i0, '^"', ''), '"$', ''); ident text not null := case(i1 = i0) when true then '"'||i0||'"' else i0 end case; begin return ident; end; $body$; Re David’s everything else being discussed just detracts attention from it. I’m not convinced. The discussion has shown that some people are somewhat confused. For example, it was suggested that a name like this: 农民 ought to be double-quoted. A simple test shows that this isn’t the case. And it helps if everybody is clear about that. There's also the question of use-cases. I've been forced to think a lot about SQL injection over the years. It's immediately obvious from reading any of the skimpiest blogs on the topic that the root cause is always faulty code that's been written by a confused developer. But it's very rare to see an account of the root cause whose wording uses carefully defined terms of art. (In fact, the notion of defining and using such terms of art has been resisted by contributors to this list.) If a future PG shipped with a built-in function like this: function is_exotic(name in text) returns boolean ...with, of course, excellent documentation, then an outfit that decided to outlaw the use of exotic names (and this is almost always how the outcome emerges) could police adherence to the rule, database wide, (and cluster wide for global phenomena) with a few trivial tests against the relevant catalog relations, like this: do $body$ begin assert not exists (select 1 from pg_class where is_exotic(relname)); end; $body$; A shipped "is_exotic" function would bring the secondary, but by no means insignificant, benefit, that the case for using “format” with "%I" or its "quote_ident" cousin could be grounded upon solidly defined, and named, notions. Like this example shows: do $body$ declare stmt constant text not null := 'create table %s(n int);'; n text not null := ''; simple_names constant text[] not null := array['farmers', 'bønder', '农民', '农民、儿童', '农民,儿童']; exotic_names constant text[] not null := array['T', 'farmers and children', '"x', 'y"', '农民 & 儿童']; begin foreach n in array simple_names loop assert not is_exotic(n), 'Problem with '||n; end loop; foreach n in array exotic_names loop assert is_exotic(n), 'Problem with '||n; end loop; end; $body$; Who knew that using Chinese punctuation characters in a name (they have their own code points, and look a bit different when you look closely) don't make the name exotic... *THE OTHER CHOICE (for option 2.2): make it true that quote_ident and format with %I surround the input with double quotes exactly and only when this is necessary* Choosing between the two alternatives (the ORCL rule or dependable parsimony) wouldn't matter much if PG comes with a built-in is_exotic function. But if the user has to implement their own, then the implementation would be far, far simpler and therefore more reliable if it were decided to implement dependable parsimony. The following code shows what I mean. create function is_exotic(name in text) returns boolean language plpgsql as $body$ declare ident constant text not null := quote_ident_2(name); stripped_ident constant text not null := regexp_replace(regexp_replace(ident, '^"', ''), '"$', ''); stmt constant text not null := 'deallocate %s'; begin /* This is the easy special case. The input is entirely letters inascii(7), or some other character set that distinguishes between upper and lower case, and has at least one upper case letter. This literal actual argument 'Dogs' is a sufficient example. Here, the value of "stripped_ident" will be equal to the value of the input. It's essential to test for this special case because the general test, below, is (famously) happy to execute "deallocate Dog" and the like. */ if lower(name) <> stripped_ident then return true; end if; /* This is the harder general case. The input is anything at all that passes the first test and that causes no error when used "as is" in the role of a SQL identifier. Without access to the actual implementation of the SQL parser, the simplest practical way to check this property is to use an empirical test. The "deallocate" statement is fairly lightweight. The approach seems to bring no measureable performance penalty. */ begin execute format(stmt, stripped_ident); exception when invalid_sql_statement_name then null; end; return false; exception when syntax_error then return true; end; $body$; Compare this with the implementation that I thought, at first, that I could use when I simply believed the doc. (The subject line of this thread hits at the trivial SQL statement that would implement the "language SQL" function.) ANd if that's all there is to it, then when not ship is as a built-in? |