David, * david@xxxxxxx (david@xxxxxxx) wrote: > the database structure is not being defined by (or specificly for) > rsyslog. so at compile time we have _no_ idea how many variables of what > type there are going to be. my example of ($timestamp,$msg) was intended > to just be a sample (avoiding typing out some elaberate set of > parameters) That's fine. I don't see any reason that the API I suggested be a compile-time option. Certainly, libpq has no idea how many, or what kind, of types are being passed to it by the caller, that's why it *has* the API that it does. You just need to work out what each prepared queries' parameters are and construct the necessary arrays. > rsyslog provides the following items, which can be sliced and diced with > substatutions, substrings, and additional inserted text. [...] Looks like mainly text fields still, so you might want to just stick with text for now. > rsyslog message formatting provides tools for doing the nessasary > escaping (and is using it for the single insert messages today) > > prepared statements in text mode have similar problems (although they > _are_ better in defending against sql injection attacks, so a bit safer). Uhh, if you use prepared statements with PQexecPrepared, there *is no* escaping necessary. I'm not sure what you mean by 'similar problems'. Can you elaborate on that? If you mean doing 'prepared queries' by using creating a string and then using PQexec with 'EXECUTE blah (1,2,3);' then you really need to go read the documentation I suggested. That's *not* what I'm getting at when I say 'prepared queries', I'm talking about a protocol-level well-defined format for passing arguments independently of commands. A call to PQexecPrepared looks like this: PQprepare(conn, "myquery", "INSERT INTO TAB1 VALUES ($1, $2);", 0, NULL); values[0] = "a"; values[1] = "b"; PQexecPrepared(conn, "myquery", 2, values, NULL, NULL, 0); Note that we don't ever send an 'EXECUTE myquery (1,2,3);' type of thing to libpq. libpq will handle the execute and the parameters and whatnot as part of the PG 3.0 protocol. > I don't see how you would easily use the API that you pointed me at above > without having to know the database layout at compile time. The arrays don't have to be of fixed length.. You can malloc() them at runtime based on the number of parameters which are being used in a particular message. Perhaps what I'm missing here is exactly what you're expecting the user to provide you with versus what you're going to be giving to libpq. I have been assuming that you have a format definition system already in place that looks something like: log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' Which you then parse, figure out what the escape codes mean, and build some structure which understands the whole thing at an abstract level. For example, you should know that '%Y' is the first variable, is an integer, etc. From this, you can determine that there are 6 parameters, at runtime. You can then malloc() an array with 6 char* pointers. Then, when you have those 6 strings somewhere, you just set each of your array parameters to the appropriate in-memory string address, eg: array = malloc(sizeof(char*) * num_of_params); for (int i = 0; i < num_of_params; i++) { switch(params[i]) { case 'Y': array[i] = my_year_string; break; case 'm': array[i] = my_month_string; break; } } etc, until you eventually have your array of pointers, with a valid in-memory string somewhere for each pointer, that you can then turn around and pass to PQexecPrepared. Obviously, you don't have to malloc() every time, if you keep track of each type of message. > I agree that defining a fixed table layout and compiling that knowledge > into rsyslog is the safest (and probably most efficiant) way to do > things, but there is no standard for log messages in a database, and > different people will want to do different things with the logs, so I > don't see how a fixed definition could work. I didn't intend to imply that you have to use a fixed definition, just that if you currently only have 1 then you might as well. It's entirely possible to support any definition using the API I suggested. > so the binary mode only makes a difference on things like timestamps and > numbers? (i.e. no significant added efficiancy in processing the command > itself?) I'm slightly confused by what you mean by this? Binary mode is for parameters only, commands are never 'binary'. Binary mode just means that the application gives the value to the database in the format which it expects, and so the database doesn't have to translate a textual representation of a value into the binary format the database needs. > thanks for taking the time to answer, I was trying to keep the problem > definition small and simple, and from your reply it looks like I made it > too simple. Yes, probably. You likely assumed that I knew something about how rsyslog works with databases, and to be honest, I have no idea. :) > I think the huge complication is that when RedHat compiles rsyslog to > ship it in the distro, they have no idea how it is going to be used (if > it will go to a database, what database engine it will interface with, or > what the schema of that database would look like). Only the > sysadmin(s)/dba(s) know that and they need to be able to tell rsyslog > what to do to get the data where they want it to be, and in the format > they want it to be in. That's really all fine. You just need to get from the user, at runtime, what they want their commands to look like. Once you have that, it should be entirely possible to dynamically construct the prepared queries, most likely without the user having to know anything about COPY or prepared statements or anything. For my part, I'd want something like: table = "mytable"; data = "$Y, $m, $d, $H, $msg"; I'd avoid having the user provide actual SQL, because that becomes difficult to deal with unless you embed an SQL parser in rsyslog, and I don't really see the value in that. If the user wants to do something fancy with the data in the database, I would encourage them to put an 'ON INSERT' trigger on 'mytable' to do whatever they want with the data that's coming in. This gives you the freedom necessary to build an appropriate statement for any database you're connecting to, dynamically, using prepared queries, and even binary mode if you want. Thanks, Stephen
Attachment:
signature.asc
Description: Digital signature