David, * david@xxxxxxx (david@xxxxxxx) wrote: > I thought that part of the 'efficiancy' and 'performance' to be gained > from binary modes were avoiding the need to parse commands, if it's only > the savings in converting column contents from text to specific types, > it's much less important. No, binary mode is about the column contents. Prepared queries is about avoiding having to parse commands (which is why the command and the data elements are seperately done). >> table = "mytable"; >> data = "$Y, $m, $d, $H, $msg"; > > if the user creates the data this way, you just reintroduced the escaping > problem. they would have to do something like > > data = "$Y" > data = "$m" > data = "$d" > data = "$H" > data = "$msg" Yes, there is a bit of escaping that the admins will have to deal with in the config file. There's no way around that though, regardless of what you do. If you let them put SQL in, then they may have to escape stuff there too. In the end, an escape problem in the config file is something which you should really catch when you read the file in, and even if you don't catch it there, it's less problematic, not really a performance problem, and much less of a security risk, than having the escaping done on data from an untrusted source. > one key thing is that it's very probable that the user will want to > manipulate the string, not just send a single variable as-is You could probably let them do some manipulation, add extra non-escape-code fields, maybe tack something on the beginning and end, basically, anything that can be done in the application prior to it hitting the database should be fine. >> I'd avoid having the user provide actual SQL, because that becomes >> difficult to deal with unless you embed an SQL parser in rsyslog, and >> I don't really see the value in that. > > there's no need for rsyslog to parse the SQL, just to be able to escape > it appropriately and then pass it to the database for execution If the user is providing SQL, then you need to be able to parse that SQL if you're going to do prepared queries. It might not require you to be able to fully parse SQL the way the back-end does, but anything you do that's not a full SQL parser is going to end up having limitations that won't be easy to find or document. For example, you could ask users to provide the prepared statement the way the database wants it, and then list the data elements seperately from it somehow, eg: myquery = "INSERT INTO blah (col1, col2, col3) VALUES ($1, $2, $3);" myvals[1] = "$Y" myvals[2] = "$M" myvals[3] = "$msg" The user could then do: myquery = "INSERT INTO blah (col1, col2) SELECT substring($1), $2;" myvals[1] = "$M" myvals[2] = "$msg" Both of these will work just fine as prepared queries. You can then parse that string by just looking for the $'s, but of course, if the user wants to put an *actual* dollar sign in, then you have to support that somehow ($$?). Then you have to deal with whatever other quoting requirements you have in your config file (how do you deal with double quotes? What about single quotes? etc, etc). You could possibly even put your escape codes into myquery and just try to figure out how to do the substitutions with the $NUMs and build your prepared query string. It gets uglier and uglier if you ask me though. In the end, I'd still worry about users coming up with new and different ways to break your sql 'parser'. > one huge advantage of putting the sql into the configuration is the > ability to work around other users of the database. See, I just don't see that. > for example, what if the database has additional columns that you don't > want to touch (say an item# sequence), if the SQL is in the config this > is easy to work around, if it's seperate (or created by the module), this > is very hard to do. You can do that with a trigger trivially.. That could also be supported through other mechanisms (for instance, let the user provide a list of columns to fill with DEFAULT in the prepared query). > I guess you could give examples of the SQL in the documentation for how > to create the prepared statement etc in the databases, but how is that > much better than having it in the config file? > > for many users it's easier to do middlein -fancy stuff in the SQL than > loading things into the database (can you pre-load prepared statements in > the database? or are they a per-connection thing?) Prepared statements, at least under PG, are a per-connection thing. Triggers aren't the same, those are attached to tables and get called whenever a particular action is done on those tables (defined in the trigger definition). The trigger is then called with the row which is being inserted, etc, and can do whatever it wants with that row (put it in a different table, ignore it, etc). > so back to the main questions of the advantages > > prepared statements avoid needing to escape things, but at the > complication of a more complex API. > > there's still the question of the performance difference. I have been > thinking that the overhead of doing the work itself would overwelm the > performance benifits of prepared statements. What work is it that you're referring to here? Based on what you've said about your application so far, I would expect that the run-time cost to prepare the statement (which you do only once) to be a bit of a cost, but not much, and that the actual inserts would be almost free from the application side, and much easier for the database to parse/use. > as I understand it, the primary performance benifit is the ability to > avoid the parsing and planning stages of the command. for simple commands > (which I assume inserts to be, even if inserting a lot of stuff), the > planning would seem to be cheap compared to the work of doing the inserts The planning isn't very expensive, no, but it's also not free. The parsing is more expensive. > on a fully tuned database are we talking about 10% performance? 1%? 0.01%? > > any ideas It depends a great deal on your application.. Do you have some example data that we could use to test with? Some default templates that you think most people end up using which we could create dummy data for? Of course, in the end, batching your inserts into fewer transactions than 1-per-insert should give you a huge benefit right off the bat. What we're talking about is really the steps after that, which might not ever be necessary for your particular application. On the other hand, I'd never let production code go out that isn't using prepared queries wherever possible. :) Thanks, Stephen
Attachment:
signature.asc
Description: Digital signature