2011/9/27, Merlin Moncure <mmoncure@xxxxxxxxx>: > *) when posting schema definitions, particularly in email format, try > not to use dumped definitions from pg_dump or pgadmin. This creates a > lot of noise in the script that detracts from what you are trying to > do. Also an attached file would probably have been more appropriate. Right! I'm sending it attached and from the source files instead of the pgAdminIII dump. Well, that'll be the next mail, I don't have the files right now. > *) using smallint key for client_inet is really dubious. why not just > use the inet itself? Sure, this has two reasons: 1) I wanted the audit table to be as narrow as possible. Believe me, I migrated the tenths of millions of tuples from the previous (slow, inefficient, extremly difficult to maintain) logging system on a test server just to try things out: the narrower the table, the better it performs with searchs. And I mean it! You wouldn't imagine. I don't know what I did with the annotated results, but I will search for them to share that. 2) I put many things outside the audit table (like the table, schema and field names too); that makes it narrower but also makes it very easy to see all values without querying the audit table: I can see in a very small table all inet's from clients (and you could gather more info if you would like too). Note that for us most of the accesses to the database come from the web server which implements a new version of a big application, so it would be a pitty to allocate 5 extra bytes (7 bytes for IPv4, 2 for smallint) just to see mostly the same IP address. So, why bother logging the IP at all? well, besides adding completeness, it allows us to see if they were using the new application or the old one, which accessed directly to the database server from the client's computer. Other fields, namely client_port and pid, aren't mapped out to other tables because they do not increase too much the table width and because I wouldn't analyze those contents very often: like never, but may happen some time some kind of weird attack that needs to be analized with this data. > *) what is the audet table for? Are you truly storing a record for > every field of every audited table? This will be incredibly > efficient, especially for large, wide tables. See the answer about criterions. > *) surely, creating a table called 'table' is not a good idea. Ok, name it "tables" then. > *) this approach obviously is a lot more complicated than hstore. > however, for 8.4 and down, hstore won't work. but, what about just > storing the record as text? Third time lucky! see the next... > *) I can't quite follow the perl criteron steps -- what is happening > there? What are the loops doing? We have some very wide tables (like 20 or more columns). We only log the value of two kind of columns: i) those which make up the primary key of the table (which helps to track down the alteration); and ii) those whose values change in the event. Note that columns in group i) can also be in group ii) This carries the complexity of the criterions, which are meant to determine the primary key of the table at any cost. Each failing criterion makes the following one to take place. These are the criterions I could think of: 1) If we got parameters, _check_ them and consider each of them as one column of the primary key. This is the cheapest and almost way through. We really have to check, because if there's i.e. some typo the whole transaction outside the trigger would fail inconditionally together and we want this logging system to interfere as least as possible. A little less performance in exchange for some more stability. 2) Search in the system catalogs for a primary key constraint. 3) Search in the system catalogs for the unique constraint which has least columns (in fact, I think it should be "the narrowest unique constraint"). 4) If the table has OIDs, use that and emit a warning (that's never a real pk, unless you make an external unique index, which I don't have intentions to check right now). 5) The "else" (or "default") case is to log every column emiting a warning (Really guys, use primary keys! ;). We wouldn't bear with these complexity every time but only once: if criterion 1) fails, after determining the "primary key" we should execute a string which drops this same trigger and re-creates it passing it the names of the columns which were determined to be the pk so that the next time we don't go furher than 1). This works, I tried it out time ago but never did the change (oops!). I mean, dropping the trigger from the trigger itself (in the docs it says that 'alter trigger' can only rename it). For my case, all this head ache was needed: it was specified as one of the requirements of the logging system that every tuple should be trackable. Using a recursive "with" query, it is possible to track down the changes to any single tuple in the audited tables (or make a function, whatever). Fortunately, they never specified a maximum time for that ;). If instead we would have made a string from the record, we wouldn't have been able to easily track the tupple. Note also the "rotate(character)" function. It hot-rotates the audit/audet tables similarly to a logrotate program in *nix. At the same time, you never stop logging, and you can dump the old table before dropping it (to save space) and restore it somewhere else to exploit it. We would usually rotate each 6 months or so, but that is shortening each time with the growing system. The *real* system is very discreet with I/U/D operations: it's not usual to have more than one of those operations per second. For higher transactional systems I think this logging system would be more noticeable (right now things go smooth). As I said before, if you have test cases they're very welcome. -- Diego Augusto Molina diegoaugustomolina@xxxxxxxxx ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán desestimados. EN: Please, avoid attaching Microsoft Office documents. They shall be discarded. LINK: http://www.gnu.org/philosophy/no-word-attachments.html -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general