Re: performance for high-volume log insertion

david@xxxxxxx · Tue, 21 Apr 2009 11:35:02 -0700 (PDT)

On Tue, 21 Apr 2009, Stephen Frost wrote:

* david@xxxxxxx (david@xxxxxxx) wrote:
I think the key thing is that rsyslog today doesn't know anything about
SQL variables, it just creates a string that the user and the database
say looks like a SQL statement.

err, what SQL variables?  You mean the $NUM stuff?  They're just
placeholders..  You don't really need to *do* anything with them..  Or
are you worried that users would provide something that would break as a
prepared query?  If so, you just need to figure out how to handle that
cleanly..

an added headache is that the rsyslog config does not have the concept of
arrays (the closest that it has is one special-case hack to let you
specify one variable multiple times)

Argh.  The array I'm talking about is a C array, and has nothing to do
with the actual config syntax..  I swear, I think you're making this
more difficult by half.

not intentinally, but you may be right.

Alright, looking at the documentation on rsyslog.com, I see something
like:

$template MySQLInsert,"insert iut, message, receivedat values
('%iut%', '%msg:::UPPERCASE%', '%timegenerated:::date-mysql%')
into systemevents\r\n", SQL

Ignoring the fact that this is horrible, horrible non-SQL,

that example is for MySQL, nuff said ;-) or are you referring to the 
modifiers that rsyslog has to manipulate the strings before inserting 
them? (as opposed to using sql to manipulate the strings)

I see that
you use %blah% to define variables inside your string.  That's fine.
There's no reason why you can't use this exact syntax to build a
prepared query.  No user-impact changes are necessary.  Here's what you
do:

<snip psudocode to replace %blah% with $num>

for some reason I was stuck on the idea of the config specifying the 
statement and variables seperatly, so I wasn't thinking this way, however 
there are headaches

doing this will require changes to the structure of rsyslog, today the 
string manipulation is done before calling the output (database) module, 
so all the database module currently gets is a string. in a (IMHO 
misguided) attempt at security in a multi-threaded program, the output 
modules are not given access to the full data, only to the distiled 
result.

also, this approach won't work if the user wants to combine fixed text 
with the variable into a column. an example of doing that would be to have 
a filter to match specific lines, and then use a slightly different 
template for those lines. I guess that could be done in SQL instead of in 
the rsyslog string manipulation (i.e. instead of 'blah-%host%' do 
'blah-'||'%host')

As I mentioned before, the only obvious issue I
see with doing this implicitly is that the user might want to put
variables in places that you can't have variables in prepared queries.

this problem space would be anywhere except the column contents, right?

You could deal with that by having the user indicate per template, using
another template option, if the query can be prepared or not.  Another
options is adding to your syntax something like '%*blah%' which would
tell the system to pre-populate that variable before issuing PQprepare
on the resultant string.  Of course, you might just use PQexecParams
there, unless you want to be gung-ho and actually keep a hash around of
prepared queries on the assumption that the variable the user gave you
doesn't change very often (eg, '%*month%') and it's cheap to keep a
small list of them around to use when they do match up.

rsyslog supports something similar for writing to disk where you can use 
variables as part of the filename/path (referred to as 'dynafiles' in the 
documentation). that's a little easier to deal with as the filename is 
specified seperatly from the format of the data to write. If we end up 
doing prepared statements I suspect they initially won't support variables 
outside of the columns.

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance