Re: performance for high-volume log insertion

Stephen Frost <sfrost@xxxxxxxxxxx> · Tue, 21 Apr 2009 15:14:58 -0400

* david@xxxxxxx (david@xxxxxxx) wrote:
>> Ignoring the fact that this is horrible, horrible non-SQL,
>
> that example is for MySQL, nuff said ;-)

indeed.

> for some reason I was stuck on the idea of the config specifying the  
> statement and variables seperatly, so I wasn't thinking this way, however 
> there are headaches

Nothing worth doing is ever completely without complications. :)

> doing this will require changes to the structure of rsyslog, today the  
> string manipulation is done before calling the output (database) module,  
> so all the database module currently gets is a string. in a (IMHO  
> misguided) attempt at security in a multi-threaded program, the output  
> modules are not given access to the full data, only to the distiled  
> result.

Ah, yes, that's definitely a problem and I agree- a very misguided
approach to doing things.  Certainly, to use prepared queries, you will
have to pass the data to whatever is talking to the database in some
kind of structured way.  There's not much advantage if you're getting it
as a string and having to parse it out yourself before using a prepared
query with the database.

In a multi-threaded program, I think it would at least be reasonably
easy/cheap to provide the output modules with the full data?  Of course,
you would need to teach the string manipulation logic to not do its
escaping and other related work for prepared queries which are just
going to use the full data anyway.

> also, this approach won't work if the user wants to combine fixed text  
> with the variable into a column. an example of doing that would be to 
> have a filter to match specific lines, and then use a slightly different  
> template for those lines. I guess that could be done in SQL instead of in 
> the rsyslog string manipulation (i.e. instead of 'blah-%host%' do  
> 'blah-'||'%host')

It would be more like: 'blah-' || %host%

Just to be clear (if you put the %host% in quotes, and then convert that
to '$1', it won't be considered a variable, at least in PG).  That might
be an issue going forward, but on the flip side, I can see some reasons
for supporting both prepared and unprepared queries, so if you implement
that through an additional template option, you can document that the
user needs to ensure the prepared query is structured correctly with the
correct quoting.  This gives you the flexibility of the unprepared query
for users who don't care about performance, and the benefits of prepared
queries, where they can be used, for users who do need that performance.

Or you could just force your users to move everything to prepared
queries but it's probably too late for that. :)  Maybe if things had
started out that way..

>> As I mentioned before, the only obvious issue I
>> see with doing this implicitly is that the user might want to put
>> variables in places that you can't have variables in prepared queries.
>
> this problem space would be anywhere except the column contents, right?

Well, it depends on the query..  You can have variables in the column
contents, sure, but you can also have them in where clauses if you're
doing something like:

insert into blah select $1,$2,$3,b from mytable where $2 = c;

I believe, in PG at least, you can use them pretty much anywhere you can
use a constant.

> rsyslog supports something similar for writing to disk where you can use  
> variables as part of the filename/path (referred to as 'dynafiles' in the 
> documentation). that's a little easier to deal with as the filename is  
> specified seperatly from the format of the data to write. If we end up  
> doing prepared statements I suspect they initially won't support 
> variables outside of the columns.

That sounds reasonable, to me at least.

	Thanks,

		Stephen
Attachment:
signature.asc

Description: Digital signature