I've implemented the scheme I briefly outlined in my last post (locking for data integrity), and now I'm writing queries to get totals from my logs, and I think I might be over-complicating things, and I'd like some alternative views on the subject. My "grandiose scheme" works like this: Problem: logging unique users that are authenticated for access to my companies website into a MySQL database table, only valid data to report is total number of unique authentications. Solution: because of the high volume of unique authentications being logged (over 15k per day), I periodically "condense" the data into 3 tables beyond the main table that logs users as they come in. My tables look like this: +---------------------------+ | access_log | +-----------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+---------------+------+-----+---------+-------+ | timestamp | timestamp(14) | YES | MUL | NULL | | | code | text | | | NULL | | | refer | text | | | NULL | | | ip | varchar(64) | | | | | | access | varchar(128) | | MUL | 0 | | +-----------+---------------+------+-----+---------+-------+ NOTE: The access field in the access_log table correlates to the client field in the table below. Logging was an afterthought initially, but then it became very important, so I cannot change the field now without a lot of debugging. Doing this database cleanup wasn't a priority either until a couple of months later when the size of the access_log table grew out of control (over 1GB on disk, 800k records). +-------------------------------------+ | access_log_daily/weekly/monthly | +--------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+---------------+------+-----+---------+-------+ | client | varchar(32) | | | | | | total | bigint(20) | | | 0 | | | stamp | timestamp(14) | YES | | NULL | | +--------+---------------+------+-----+---------+-------+ I've set up a cron jobs that run these queries periodically: Daily: LOCK TABLES access_log WRITE, access_log_daily WRITE; INSERT INTO access_log_daily (client, total) SELECT access, COUNT(timestamp) AS total FROM access_log GROUP BY access; DELETE FROM access_log; UNLOCK TABLES; Weekly: LOCK TABLES access_log_daily WRITE, access_log_weekly WRITE; INSERT INTO access_log_weekly (client, total) SELECT client, SUM(total) as total FROM access_log_daily GROUP BY client; DELETE FROM access_log_daily; UNLOCK TABLES; Monthly: LOCK TABLES access_log_weekly WRITE, access_log_monthly WRITE; INSERT INTO access_log_monthly (client, total) SELECT client, SUM(total) as total FROM access_log_weekly GROUP BY client; DELETE FROM access_log_weekly; UNLOCK TABLES; So this way, the access_log table is never bigger than the total number of users for a 24 hour period, but we can still look into it if a problem arises where we need someone's ip address or the auth code they were given, and the other tables can only be as big as the interval of "condensation", times the total number of clients, and store totals, so it *should* be easy to gather information from them. In my reporting scripts, I need to be able to find the grand totals for the past hour, day, week, month, and year. My first idea was to use multiple queries and just add all of the results together, but I'm not sure if this is the best way, or if it will reflect the most accurate per-period results. I appreciate any input anyone has. Thanks, -- Josh -- PHP Database Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php