On 15 March 2015 at 08:44, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
"David G. Johnston" <david.g.johnston@xxxxxxxxx> writes:
> IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see how
> it is possible for format to lay in a value at %I that is any more
> insecure than the current behavior. If the input string already matches
> that pattern then it could be output as-is without any additional risk and
> with the positive benefit of making this case work as expected. The broken
> case then exists when someone actually intends to name their identifier
> <"something"> which then correctly becomes <"""something"""> on output.
But that's exactly the problem: you just broke a case that used to work.
format('%I') is not supposed to guess at what the user intends; it is
supposed to produce a string that, after being passed through identifier
parsing (dequoting or downcasing), will match the input. It is not
format's business to break that contract just because the input has
already got some double quotes in it.
An example of where this might be important is if you're trying to
construct a query with arbitrary column headers in the output. You
can do
format('... AS %I ...', ..., column_label, ...)
and be confident that the label will be exactly what you've got in
column_label. This proposed change would break that for labels that
happen to already have double-quotes --- but who are we to say that
that can't have been what you wanted?
I agree with Tom that we shouldn't key off of contents in the string to determine whether or not to quote. Introducing the behave I describe in an intuitive way would require some kind of type-specific handling in format(). I'm not sure what the cost of this is to the project, but David makes the very reasonable point that imposing the burden of choosing between `%s` and `%I` opens up the possibility of confusing vulnerabilities.
Kind Regards,
Jason Dusek